Tuesday, September 11, 2012

Paper Reading #6: Profanity Use in Online Communities

Introduction

In their paper, "Profanity Use in Online Communities," Sara Owsley Sood, Judd Austin, and Elizabeth F. Churchill analyzed the current methods of list-based profanity detection systems and their limitations. Then they discussed how different communities have different interpretations what is profane. That is, there is different contexts in which differing degrees of profanity is appropriate. The authors presented this paper at CHI 2012 in Austin, Texas, in May 2012. Sood is an assistant professor at Pomona College and is interested on the impact of emotion on online communication, Antin and Churchill are research scientists at Yahoo! Research.

Summary

In order to evaluate some of the current methods of profanity detection, they used a data set of user comments over a three-month period from the socials news website Yahoo! Buzz. They employed Amazon Mechanical Turk (MTurk) to evaluate a subset of the comments by answering several short questions to sort the comments. In order to train the Turkers to evaluate on similar standards, the authors used a 'gold data' model which gave examples and explanations of why certain comments were profane or insulting. The Turkers' evaluations became the basis on which they analyzed their results.

Figure 1: Different uses of symbols, like @ and #, in non-profane ways


Related Work Not Referenced in the Paper

Sood, et al., said that research on profanity and other inappropriate content has been done before. After looking around, I also believe their observations. This paper is not novel in the sense that such work has been done and evaluated before.

1. Maignan, Isabelle, and Bryan A. Lukas. "The Nature and Social Uses of the Internet: A Qualitative Investigation." Journal of Consumer Affairs. 31.2 (1997): 346-71. Web. 11 Sep. 2012.

Maignan and Lukas's work on how people use the internet can provide Sood, et al., with reasons why users use profanity in certain domains or contexts more than in others. Furthermore, Sood, et al., could probably look at their work and how marketing activities further affect internet usage and, in turn, profanity usage.

2. Greenfield, Paul, Peter Rickwood, and Huu Cuong Tran. "Effectiveness of Internet Filtering Software Products." CSIRO Mathematical and Information Sciences. (2001): n. page. Web. 11 Sep. 2012.

Greenfield, et al., looked a specific products that filtered internet content. Sood, et al., can use this work to give a concrete perspective on the commercial success of different products that tried to filter our profanity and other inappropriate material on websites.

3. Johnson, Thomas J., Barbara K. Kaye, et al. "Every Blog Has Its Day: Politically-interested Internet Users' Perceptions of Blog Credibility." Journal of Computer-Mediated Communication. 13.1 (2007): 100-22. Web. 11 Sep. 2012.

Johnson, et al., provide another way through which users can comment: blogs. Sood, et al., can use their observations and interpretations from their own data to further their research into political-based blogs and see how users use profanity and insults in their comments compared to those on other blogs.

4. Finn, Jerry, and Mary Banach. "Victimization Online: The Down Side of Seeking Human Services for Women on the Internet." CyberPsychology & Behavior. 3.2 (2000): 243-54. Web. 11 Sep. 2012.

For women seeking health and human services on the internet, they should not have to endure cyberstalking, identity theft, and attacks. Sood, et al., should also work towards filtering such content to protect these women and many other people from abusive attacks.

5. Semitsu, Junichi P. "Burning Cyberbooks in Public Libraries: Internet Filtering Software vs. The First Amendment." Stanford Law Review. 52.2 (2000): 509-45. Web. 11 Sep. 2012.

Semitsu offers a very different perspective on the effects of internet filtering. He discusses how it affects people's right to free speech. Sood, et al., did not consider the ethical use of the filtering of profane comments, but it is something that needs to be examined. In certain contexts, filtering is more appropriate, but it is not appropriate in all contexts.

6. Balkin, J.M., Beth S. Noveck, and Kermit Roosevelt. "Filtering the Internet: A Best Practices Model." (1999): n. page. Web. 11 Sep. 2012. 

Balk, et al., provide guidelines for filtering content on websites. They also consider philosophical and legal ramifications of filtering content. Sood, et al., need to keep these things in mind when discussing filtering content for different communities.

7. Zimmer, Eric A., and Christopher D. Hunter. "Risk and the Internet: Perception and Reality." Communication and cyberspace. (2002): n. page. Web. 11 Sep. 2012.

Zimmer and Hunter provide data that there is not as much objectionable content on the internet as is perceived by many others. This data should be interpreted by Sood, et al., as meaning that they need to examine different websites because the amount of profane comments in different categories can be quite varied from the data that they have from one website.

8. Preston, Cheryl B. "Zoning the Internet: A New Approach to Protecting Children Online." Brigham University Law Review. (2007): 1417-67. Web. 11 Sep. 2012.

Preston gives another way to protect children from inappropriate content on the internet. She also give another perspective on why filters do not work, which reinforces Sood, et al.'s work.

9. Shin, J. "Morality and Internet Behavior: A study of the Internet Troll and its relation with morality on the Internet." Proceedings of Society for Information Technology & Teacher Education International Conference 2008. (2008): 2834-40. Web. 11 Sep. 2012.

Shin looks at a new phenomenon that has been seen on the internet these days: the Troll. Sood, et al., should wonder how the Troll affects web content and profanity.

10. Sood, Sara Owsley, Judd Antin, and Elizabeth F. Churchill. "Using Crowdsourcing to Improve Profanity Detection." AAAI Technical Report SS-12-06. n. page. Web. 11 Sep. 2012.

Sood, et al., used their own work on how crowdsourcing can be used to detect profanity more effectively than list-based methods. This is important because there work for this paper is based on their findings from that paper.

Evaluation

The authors used very objective measures to evaluate their work. These measures were both qualitative and quantitative. The methods were systemic. In this work, Sood, et al., wanted to find the answer to three main questions: whether current profanity detection systems were effective, how profanity differs between communities, and how profanity is received by different communities.

First, to see if current profanity detection systems were effective, the authors used a shared profanity list from the website phorum.org. They built a system that simply flags a comment as profane if it contains any of the words on that list. Then they used a second list of words from noswearing.com, which has a list of community contributed profane terms. This means that this list evolves over time and is more comprehensive than the phorum.org list. They also used a stemmer which would check to see if the words in the comment had a shared stem with any word on a profanity list. They compared the Turkers' evaluations to the results of the systems that they built. They found that the best performance came from the list from noswearing.com combined with a stemming system, which detected 40.2% of the profanity cases at 52.8% precision. This is not impressive at all. The authors believe that this lack of precision and accuracy was caused by intentional or accidental misspellings (for example, shiiit) and the overlapping use of certain symbols, like the @ symbol, in different contexts.

Second, the authors evaluated how profanity differs between communities and domains. They examined how frequently profanity was used by looking at the data collected from the Turkers and the domains from which the comments came. They found that political stories contained more profanity, more insults, and more directed insults to others than any other domain. Each domain had a different emphasis on each of these three categories, implying that with differing communities and domains, there are different degrees of profanity, insults, and directed insult. They also looked at how profanity was used. Whenever a comment has profanity, it is more likely that it also contains an insult or a directed insult. Also, there was an existence of the inverse relationship; if a comment has an insult, it is more likely to contain profanity. When the comment is not an insult, profanity tends to be used when people have negative rants on the topic being discussed.

Figure 2: Distribution of comments containing profanity withing different topical domains


Finally, to look at how profanity is received, they looked at the 'rate up' and 'rate down' votes for each comment. They used this to imply that a comment got more attention. The found that profane comments were more likely to get 'rate up' votes and 'rate down' votes. The authors think this means that profane comments are more popular or more widely read than non-profane comments.

Discussion

While I think this work is interesting, I do not this work was that novel. By the authors' admission, there has been a tremendous amount of work done before. However, no work has a precise and accurate solution to filtering out profane comments. I think the evaluation was appropriate on a small scale. The analysis included just about 6500 comments. They could have explored comments on other social news media, too. However, for the data that they did use, I think they used mostly valid methods. There were a lot of assumptions made, but it is difficult to make a standard for, say, popularity. I think continued work on this is important because of how accessible the internet and social media are. We need to be conscious of ourselves and others' actions in certain communities and contexts, and we must act appropriately.

1 comment:

  1. I wish I had an answer to that because I'm tired of answering that question. See the link below for more info.

    #answering
    www.ufgop.org

    ReplyDelete