Google Autocomplete results for swing state voters
We’re almost at the end of a really tight election…I’ve been obsessively monitoring FiveThirtyEight. Lately I’ve been interested in using search data to identify what issues people are thinking about, and possibly to predict winners. People are more likely to be honest in their search box than they are on the phone with a pollster. I don’t have access to a search data firehose, so Autocomplete data was the most readily-available proxy. Autocompletions are the search suggestions that Google serves you when it’s trying to guess what you’re looking for. Over the past three weeks, Justin and I crowdsourced a data set of Google Autocomplete results for 140 questions, gathered from people across the country. The root questions covered the spectrum of who/what/when/where/why/how. Some were the kinds of questions a user would ask while researching issues (“What does Obama think about “) while others were more open-ended (“Why is Romney “).
Below is an interactive visualization of representative* results from the eight swing states: Nevada, Colorado, Virginia, Florida, New Hampshire, Ohio, Iowa, and Wisconsin. There are 8182 unique autocomplete responses in there. Click on a bubble, and the table below will update. Bigger bubbles indicate more results for a particular topic. The use of color shows whether the suggestions for an issue skewed more Romney (red) or more Obama (blue). The chart below the bubbles reveals specific autocomplete suggestions, and in what states they appeared; the larger the dot under the state name, the more times that keyword was suggested. (If you’re using IE, has to be 9 or later. If you’re seeing the right side a bit cut off, it’s because I just updated my blog layout…fixing this is top of the agenda.)
You’ll notice that a lot of the bigger bubbles focus on personal information about the candidates. If you look at specific issues, healthcare (in its various forms - Obamacare, Medicare, etc) dominates. Interestingly, Colorado, Ohio, and Virginia were the only states with “Romneycare” as a specific suggestion. (CO, OH, and VA seem to have a lot of overlap.) Job creation was the most common specific economic issue, with most of the suggestions focusing on Romney’s job creation plan…~5x more results for Romney than Obama. Abortion was the most-queried social issue.
A majority of the factoids, rumors, and controversies were about Obama - birtherism, “hating America”, being a socialist/marxist/communist, and not properly respecting the flag all figured prominently. On the creepy/weird front, there were a few results asking why the President isn’t dead yet, and, from Ohio, “Obama is Hitler”; there was no alternate equivalent suggestion for Romney. On the flip side, the Romney-factoid autocomplete results were primarily about his taxes and his Mexican heritage.There was also a lot of concern for his dog, and many questions about his lack of military service. He is not a unicorn.
It’s interesting to see how queries compare to the stories in the media; although women’s issues are dominating the national conversation, the only result to mention them was, “How does Mitt Romney feel about women?”
Anyway, this is a first pass over some of the results, so dig in, and leave a comment if something strikes you. :) If you’re interested in the full data set, it’s available (anonymized) in CSV format here or as a Google Fusion Table here. If you make something with it, please let me know.
We are still trying for a more representative set of all 50 states, so if you live in AL, DE, HI, ID, LA, MS, NE, ND, OK, RI, SC, SD, WV, or WY, I would be so happy if you would take a second and go run the script.…ASAP, so we can use it in a broader post about the election before it happens. :) And just to state the obvious, this isn’t scientific; I unfortunately have no insight into specific search query volume. If you want to see some great quantitative analysis, head over to FiveThirtyEight, or to the Google Politics & Elections page…they have access to an incredible data set.
Rigorous methodology: I put together 140 who/what/when/where/why-style questions about Obama and Romney. Justin packaged them up into a script, which friendly volunteers ran in their terminal (bypassing browser history to the greatest extent possible). The script passed back the top 10 results for each of ~140 questions; the only personal data collected was the IP address, which we then geocoded to identify the location from which the script was run. Nothing based on autocomplete data (or geocoding!) is exact science, but most of the queries run in the same city did return the same results. All total, there were 127 unique responses from 34 states and 8 countries. Results came in over a period of three weeks, so it’s possible that autocomplete results changed during that time.
It’s our first time using D3, so we’d welcome any feedback - this originally started out as a treemap, and we had some difficulty trying to attach filters to the bubble viz…I think having one more level of classification into “big topics” would have made the relationship between things slightly clearer. If you’re a data geek, I’d also be curious to hear what you would look for with this material, since I’m pretty new to this.
*representative = since this project depended on getting people in other states to run a script for me, I had some difficulty getting data from states outside my network. That meant that I had different numbers of state sets to work with. I tried to pick the “most typical” result for each state, by comparing answers and rankings of answers wherever possible, and taking the result from the area of greatest population otherwise.