A student project at California’s Humboldt State University maps all references to “hate” words on geolocatable tweets between June 2012 and April 2013. It’s an interesting study, but the results should be used with care. Three aspects of the data collection and processing make this approach problematic, but the study deals directly with only one.
First, was the tweet really a negative? Phrases like “…queer theory says…” and “…I’m just an old cripple…” are two ways that ‘hate’ words might not be negatives. The study deals with this in a straightforward manner — the students read every tweet and applied a definitional rubric.
Second, is there any kind of processing bias? If you use raw numbers, big cities will dominate the map: Portland will generate more tweets and more hate tweets than Tilamook. To avoid this, the study categorized the data as a percentage of tweets from a given area. This throws them into another basin of attraction for errors: a small town with few tweeters will show up here if it holds even one prolific hater. For example, The Dalles is a little one-Starbucks town in northern Oregon (population 13,000, or about two cruise ships). Portland is a major metropolis (750,000 people in the county). On the map below, The Dalles stands out like a beacon in the NW, while Portland doesn’t even warrant shading.
Still, this is an imaginative use of data available from social media, and despite its flaws it’s a worthwhile project.