Archive for August, 2017

Implicit Versus Explicit Prejudice

August 30, 2017

This post is based largely on the groundbreaking book by Seth Stephens-Davidowitz “Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who we Really Are.” Any theory of racism has to explain the following puzzle in America: On the one hand, the overwhelming majority of black Americans think they suffer from prejudice—and they have ample evidence of discrimination in police stops, job interviews, and jury decisions. On the other hand, very few white Americans will admit to being racist. The dominant explanation has been that this is due, in large part, to widespread implicit prejudice. According to this theory white Americans may mean well, but they have a subconscious bias, which influences their treatment of black Americans. There is an implicit-association test for such a bias. These tests have consistently shown that it takes most people milliseconds more to associate black faces with positive words such as “good,” than with negative words such as “awful.” For white faces, the pattern is reversed. The small extra time it takes is interpreted as evidence of someone’s implicit prejudice—a prejudice the person may not even be aware of.

There is an alternative explanation for the discrimination that African-Americans feel and whites deny: hidden explicit racism. People might be aware of widespread conscious racism but to which they do not want to confess—especially in a survey. This is what the search data seems to be saying. There is nothing implicit about searching for “n_____ jokes.” It’s hard to imagine that Americans are Googling the word “n_____“ with the same frequency as “migraine and economist” without explicit racism having a major impact on African-Americans. There was no convincing measure of this bias prior to the Google data. Seth uses this measure to see what it explains.

It explains, as was discussed in a previous post, why Obama’s vote totals in 2008 and 2012 were depressed in many regions. It also correlates with the black-white wage gap, as a team of economists recently reported. In other words, the areas Seth found that make the most racist searches underpay black people. When the polling guru Nate Silver looked for the geographic variable that correlated most strongly with support in the 2016 Republican primary for Trump, he found it in the map of racism Seth had developed. That variable was searches for “n_____.”

Scholars have recently put together a state-by-state measure of implicit prejudice agains black people, which enabled Seth to compare the effects of explicit racism, as measured by Google searches, and implicit bias. Using regression analysis, Seth found that, to predict where Obama underperformed, an area’s racist Google searches explained a lot. An area’s performance on implicit-association tests added little.

Seth has found subconscious prejudice may have a more fundamental impact for other groups. He was able to use Google searches to find evidence of implicit prejudice against another segment of the population: young girls.

So, who would be harboring bias against girls? Their parents. Of all Google searches starting “Is my 2-year-old, the most common next word is “gifted.” But this question is not asked equally about young boys and young girls. Parents are two and a half times more likely to ask “Is my son gifted?” than “Is my daughter gifted?” Parents overriding concerns regarding their daughters is anything related to appearance.

The URL above will take you to a number of options for taking and learning about the implicit association test.


The Truth About Your Facebook Friends

August 29, 2017

This post is based largely on the groundbreaking book by Seth Stephens-Davidowitz “Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who we Really Are.” Social media are another source of big data. Seth writes, “The fact is, many Big Data sources, such as Facebook, are often the opposite of digital truth serum.

Just as with surveys, in social media there is no incentive to tell the truth. Much more so than in surveys, there is a large incentive to make yourself look good. After all, your online presence is not anonymous. You are courting an audience and telling your friends, family members, colleagues, acquaintances, and strangers who you are.

To see how biased data pulled from social media can be, consider the relative popularity of the “Atlantic,” a highbrow monthly magazine, versus the “National Enquirer,” a gossipy often-sensational magazine. Both publications have similar average circulations, selling a few hundred thousand copies (The “National Enquirer” is a weekly, so it actually sells more total copies.) There are also a comparable number of Google searches for each magazine.

However, on Facebook, roughly 1.5 million people either like the “Atlantic” or discuss articles from the “Atlantic” on their profiles. Only about 50,000 like the Enquirer or discuss its contents.

Here’s an “Atlantic” versus “National Enquirer” popularity compared by different sources:
Circulation Roughly 1 “Atlantic” for every 1 “National Enquirer”
Google searches 1 “Atlantic” for every 1 “National Enquirer”
Facebook Likes 27 “Atlantic” of every 1 “National Enquirer”

For assessing magazine popularity, circulation data is ground truth. And Facebook data is overwhelmingly biased against the trashy tabloid, making it the worst data for determine what people really like.

Here are some excerpts from the book:
“Facebook is digital brag-to-my friends-about-how-good-my life-is-serum. In Facebook world, the average adult seems to be happily married, vacationing in the Caribbean, and perusing the “Atlantic.” In the real world, a lot of the people are angry, on supermarket checkout lines, peeking at the “National Enquirer”, ignoring phone calls from their spouse, whom them haven’t slept with in years. In Facebook world, family life seems perfect. In the real world, family life is messy. I can be so messy that a small number of people even regret having children. In Facebook world, it seems every young adult is at a cool party Saturday night. In the real world, most are at home alone, binge-watching shows on Netflix. In Facebook world, a girlfriends posts twenty-six happy pictures from her getaway with her boyfriend. In the real world, immediately after posting this, she Googles “my boyfriend won’t have sex with me.”


In summary:
DIGITAL TRUTH                          DIGITAL LIES
Searches                                        Social media posts
Views                                             Social media likes
Clicks                                             Dating profiles

Some Common Ideas Debunked

August 28, 2017

This post is based on the groundbreaking book by Seth Stephens-Davidowitz “Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who we Really Are.”

A common notion is that a major case of racism is economic insecurity and vulnerability. So it is reasonable to expect that when people lose their jobs, racism increases. But neither racist searches nor membership in Stormfront rises when unemployment does.

It is reasonable to think that anxiety is highest in overeducated big cities. A famous stereotype is the urban neurotic. However, Google searches reflecting anxiety—such as “anxiety symptoms” or “anxiety help” tend to be higher in places with lower levels of education, lower median incomes, and where a larger portion of the population lives in rural areas. There are higher search rates for anxiety in rural upstate New York than in New York City.

It is reasonable to think that a terrorist attack that kills dozens or hundreds of people would automatically be followed by massive, widespread anxiety. After all, terrorism, by definition, is supposed to instill a sense of terror. Seth looked for Google searches reflecting anxiety. He tested how much these searches rose in a country in the days, weeks, and months following every major European or American terrorist attack since 2004. So, on average, how much did anxiety-related searches rise? They didn’t. At all.

Humor as long been thought of as a way to cope with frustrations, the pain, the inevitable disappointments of life. Charlie Chaplin said, “laughter is the tonic, the relief, the surcease from pain.” Yet, searches for jokes are lowest on Mondays, they day when people report they are most unhappy. They are lowest on cloudy and rainy days. And they plummet after a major tragedy, such as when two bombs killed three and injured hundreds during the 2013 Boston Marathon. Actually people are more likely to look for jokes when things are going well in life than when they aren’t.

Seth argues that the bigness part of big data is overrated. He writes that the smartest Big Data companies are often cutting down their data. Major decisions at Google are based on only a tiny sampling of all their data. Seth continues, “You don’t always need a ton of data to find important insights. You need the right data. A major reason that Google searches are so valuable is not that there are so many of them; it is that people are so honest in them.

Every Body Lies

August 27, 2017

“Everybody Lies” is the title of a groundbreaking book by Seth Stephens-Davidowitz on how to effectively exploit big data. The subtitle to this book is “Big Data, New Data, and What the Internet Reveals About Who We Really are.” The title is a tad overblown as we always need to have doubts about data and data analysis. However, it is fair to say that the internet currently does the best job at revealing who we really are.

The problem with surveys and interviews is that there is a bias to make ourselves look better than we really are. Indeed, we should be aware that we fool ourselves and that we can think we are responding honestly when in truth we are protecting our egos.

Stephens-Davodowitz uses Google trends as his principle research tool and has found that people reveal more about their true selves in these searches than they do in interviews and surveys. Although the pols erred in predicting that Hilary Clinton would win the presidency, Google searches indicated that Trump would prevail.

Going back to Obama’s first election night, when most of the commentary focused on praise of Obama and acknowledgment of he historic nature of his election, roughly one in every hundred Google searches that included “Obama” also included “kkk” or “n_____.” On election night searches and sign-ups for Stormfont, a white nationalist site with surprisingly high popularity in the United States, were more than ten times higher than normal. In some states there were more searches for “n____- president” than “first black president.” So there was a darkness and hatred that was hiding from the traditional sources but was quite apparent in the searches that people made.

These Google searches also revealed that a much of what we thought about the location of racism was wrong. Surveys and conventional wisdom placed modern racism predominantly in the South and mostly among Republicans. However, the places with the highest racist search rates included upstate New York, western Pennsylvania, eastern Ohio, industrial Michigan and rural Illinois, along with West Virginia, southern Louisiana, and Mississippi. The Google search data suggested that the true divide was not South versus North, but East versus West. Moreover racism was not limited to Republicans. Racist searches were no higher in places with a high percentage of Republicans than in places with a high percentage of Democrats. These Google searches helped draw a new map of racism in the United States. Seth notes that Republicans in the South may be more likely to admit racism, but plenty of Democrats in the North have similar attitudes. This map proved to be quite significant in explaining the political success of Trump.

In 2012 Seth used this map of racism to reevaluate exactly the role that Obama’s race played. In parts of the country with a high number of racist searches, Obama did substantially worse than John Kerry, the white presidential candidate, had four years earlier. This relationship was not explained by an other factor about these ares, including educational levels, age, church attendance, or gun ownership. Racist searches did not predict poor performance for any Democratic candidate other than Obama. Moreover these results implied a large effect. Obama lost roughly 4% points nationwide just from explicit racism. Seth notes that favorable conditions existed for Obama’s elections. The Google trends data indicated the there were enough racists to help win a primary or tip a general election in a year not so favorable for Democrats.

During the general election there were clues in Google trends that the electorate might be a favorable one for Trump. Black Americans told polls they would turn out in large numbers to oppose Trump. However Google searches for information on voting in heavily black areas were way down. On election day, Clinton was hurt by low black turnout. There were more searches for “Trump Clinton” than for “Clinton Trump” in key states in the Midwest that Clinton was expected to win. Previous research has indicated that the first name in search pairs like this is likely the favored candidate.

The final two paragraphs in this post are taken directly from Seth’s book.

“But the major clue, I would argue, that Trump might prove a successful candidate—in the primaries, to begin with—was all that secret racism that my Obama study had uncovered, The Google searches revealed a darkness and hatred among a meaningful number of Americans that pundits, for many years, had missed. Search data revealed that we lived in a very different society from the one academics and journalists, relying on polls, thought that we lived in. It revealed a nasty, scary, and widespread rage that was waiting for a candidate to give voice to it.

People frequently lie—to themselves and to others. In 2008, Americans told surveys that they no longer cared about race. Eight years later, they elected as president Donald J. Trump, a man who retweeted a false claim that black people were responsible for the majority of murders of white American, defended his supporter for roughing up a Black Lives Matter protestor at one of his rallies, and hesitated in repudiating support from a former leader of the Ku Klux Klan (HM feels compelled to note that Trump has not renounced the latest endorsement by the leader of the Ku Klux Klan). The same hidden racism that hurt Barack Obama helped Donald Trump.


Another Hiatus

August 1, 2017

We’re going on another international cruise. On our last international cruise Trump was running for the Republican nomination. This was extremely embarrassing. Now that he’s President, it’s more than embarrassing. We shall be ashamed to admit we are Americans.

During his absence, HM strongly recommends “NO IS NOT ENOUGH: Resisting Trump’s Shock Politics and Winning the World We Need” by Naomi Klein. It provides an enlightening analysis of how this disaster occurred, and, more importantly, provides ideas on how we can recover from this disaster.