Posts Tagged ‘Seth Stephens-Davidowitz’

These Posts Only Scratched the Surface

September 5, 2017

Of the groundbreaking book by Seth Stephens-Davidowitz “Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who We Really Are, ” the preceding posts have only scratched the surface. The adjective groundbreaking is appropriate as this book opens up a new and very valuable source of data, internet searches. These searches bypass most of our defenses and provide a more accurate view of the person making the searches. Seth describes not only how words are used as data, but also how bodies and pictures are used as data.

One section is titled Digital Truth Serum. In addition to Hate and Prejudice, and the Internet itself it covers the truth about customers, child abuse, abortion, and sex. HM expects that this book will become a best seller primarily for its truth about these very sensitive topics. Much of this true content is depressing and the author asks, “Can We Handle the Truth?”

A section titled Zooming In discusses
What’s Really Going on in Our Counties, Cities, and Towns?
How We fill Our Minutes and Hours
Our Doppelgängers
Seth tells stories using data.

A section titled All the World’s a Lab discusses the techniques Google and other companies use to test and evaluate their presentations. It also discusses what Seth terms Natures Cruel—but Enlightening Experiments.

The last part of the book is titled: BIG DATA HANDLE WITH CARE.
Here he discusses what Big Data Can and Cannot Do that includes The Curse of Dimensionality and The Overemphasis on What is Measurable. Although the discussion is technical, it should be accessible to most readers.

The penultimate chapter discusses two dangers:
The Danger of Empowered Corporations
The Danger of empowered Governments


The Truth About the Internet

September 3, 2017

This post is based largely on the groundbreaking book by Seth Stephens-Davidowitz “Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who we Really Are.” Perhaps the most common statement about the internet with which everyone agrees is that the internet is driving Americans apart and that it plays a large part in the polarization of the nation. The only problem with this generally agreed upon view is that it is wrong.

The evidence against this piece of conventional wisdom comes from a 2011 study by two economists, Matt Gentzkow and Jesse Shapiro. They collected data on the browsing behavior of a large sample of Americans. Their dataset included the self-reported ideology, whether they were liberal or conservative, of the research participants.

Gentzkow and Shapiro asked themselves the following question: Suppose you randomly sampled two Americans who happen to both be visiting the same news website. What is the probability that one of them will be liberal and the other conservative? In other words, how frequently do liberals and conservatives “meet” on news sites? Suppose liberals and conservatives on the internet never got their online news from the same place? In other words, liberals exclusively visited liberal websites, and conservatives exclusively visited conservative ones. If this were the case, the chances that two Americans on a given news site have opposing political views would be 0%. The internet would be perfectly segregated. Liberals and conservatives would never mix.

However, suppose, in contrast, that liberals and conservatives did not differ at all in how they got their news. In other words, a liberal and a conservative were equally likely to visit any particular news site. If this were the case, the chances that two Americans on a given news website have opposing political views would be about 50%. Then the internet would be perfectly desegregated. Liberals and conservatives would perfectly mix.

According to Gentzkow and Shapiro in the United States, the chances that two people visiting the same news site have different political views is about 45%. So the internet is far closer to perfect desegregation than perfect segregation. Liberals and conservatives are “meeting” each other on the web all the time.

Using data from the General Social Survey, Gentzkow and Shapiro found that all these numbers were lower than the chances that two people on the same news website have different politics.

This lack of segregation on the internet can be put further in perspective by comparing it to segregation in other parts of our lives. Here are the probabilities that someone you meet has opposing political views

On a News website 45.2%
Coworker 41.6%
Offline Neighbor 40.3%
Family Member 37%
Friend 34,7%

So in other words, you are more likely to come across someone with opposing views online than offline.

As to why isn’t the internet more segregated, there are two factors that limit political segregation on the internet. The first reason is that the internet news industry is dominated by a few massive sites. In 2009, four sites, Yahoo News, AOL News,, and —collected more than half of the news views. Yahoo News is the most popular news site among Americans, with close to 90 million unique monthly visitors. This is 600 times the white supremacist Stormfront audience. Mass media sites like to appeal to a broad, political diverse audience.

The second reason the internet isn’t all that segregated is that many people with strong political opinions visit sites of the opposite viewpoint. The reason here is similar to the reason for the hostility to the first address by President Obama on the Mass Shooting in San Bernadino. People like to defend their views, and, perhaps, to convince themselves that the opposition are idiots. Seth notes that someone who visits think and—two extremes liberal sites—is more likely than the average internet user to visit, a right leaning site. Someone who visits or —two extremely conservative sites—is more likely than the average internet user to visit, a more liberal site.

The Gentzkow and Shapiro study was based on data from 2004-2009, which was relatively early in the history of the internet. Might the internet have grown more compartmentalized since then? Have social media, particularly Facebook, altered their conclusion. If our friends tend to share our political views, the rise of social media should mean a rise of echo chambers, shouldn’t it.

It’s complicated. Although it is true that people’s friends on Facebook are more likely than not to share their political views, a team of data scientists—Eytan Bakshy, Solomon Messing, and Lada Adamic—found that a surprising amount of the information people get on Facebook comes from people with opposing views? So how can this be? Don’t our friends tend to share our political views? They do? But there is a crucial reason that Facebook may lead to a more diverse political discussion than offline socializing. On average people have substantially more friends on Facebook than they do offline. These weak ties facilitated by Facebook are more likely to be people with opposite political views.

So Facebook exposes its users to weak social connections. These are people with whom you might never have external social interactions, but you do Facebook friend them. And you do see their links to articles with views you might never have otherwise considered.

In sum, the internet actually does not segregate different ideas, but rather gives diverse ideas a larger distribution.


Effectively Countering Islamophobia

September 2, 2017

The immediately preceding post on Obama’s Prime-time Address After the Mass Shooting in San Bernadino indicated that President’s Obama’s appeal to our better nature failed. Worse yet, it was counterproductive, with Islamophobia increasing, not decreasing. As promised, here is a more effective presentation President Obama made two months after that original piece. This time Obama spent little time insisting on the value of tolerance. Instead he focused overwhelmingly on provoking people’s curiosity and changing their perceptions of Muslim Americans. He told us that many of the slaves from Africa were Muslim; Thomas Jefferson and John Adams had their own copies of the Koran; the first mosque on U.S. soil was in North Dakota; a Muslim American designed skyscrapers in Chicago. Obama again spoke of Muslim athletes and armed service members but also talked of Muslim police officers and firefighters, teachers, and doctors.

So what was wrong with Obama’s original address? He was telling many in his audience that their emotional responses were wrong. Kahneman’s Two System view of cognition can be helpful. System 1 is named Intuition. System 1 is very fast, employs parallel processing, and appears to be automatic and effortless. They are so fast that they are executed, for the most part, outside conscious awareness. Emotions and feelings are also part of System 1. Islamophobic responses are essentially System 1 responses. Learning is associative and slow. For something to become a System 1 process requires much repetition and practice. Activities such as walking, driving, and conversation are primarily System 1 processes. They occur rapidly and with little apparent effort. We would not have survived if we could not do this types of processing rapidly. But this speed of processing is purchased at a cost, the possibility of errors, biases, and illusions. System 2 is named Reasoning. It is controlled processing that is slow, serial, and effortful. It is also flexible. This is what we commonly think of as conscious thought. One of the roles of System 2 is to monitor System 1 for processing errors, but System 2 is slow and System 1 is fast, so errors to slip through.
In addition to engaging System 1 processes, many in the audience needed to justify their feelings. Consequently they made Google searchers hardening their views.

However, in his second address he bypassed System 1 processes by providing new information processing to System 2, which is what we commonly regard as thinking. So their views were not directly challenged in this nonthreatening presentation. New information was presented that might be further processed with a resulting decrease in Islamophobia.

Changing hardened beliefs is very difficult. Directly challenging these beliefs is counterproductive. So the approach needs to employ some sort of end run around these beliefs. That is what Obama did by providing nonthreatening information in his second address.

The Southern Poverty Law Center has developed some effective approaches in which people of different beliefs work together to solve a problem. This approach is difficult and time consuming but it has worked in a variety of circumstances. This approach is not likely to be universally applicable as it does require people of different beliefs to interact.

© Douglas Griffith and, 2017. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Douglas Griffith and with appropriate and specific direction to the original content.

The Response to Obama’s Prime-time Address After the Mass Shooting in San Bernadino

September 1, 2017

This post is based largely on the groundbreaking book by Seth Stephens-Davidowitz “Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who we Really Are.” On December 2, 2015 in San Bernadino, California Rizwan Farook and Tashfeen Malik entered a meeting of Farook’s coworkers armed with semiautomatic pistols and semiautomatic rifles and murdered fourteen people. Literally minutes after the media first reported one of the shooter’s Muslim-sounding name, a disturbing number of Californians had decided what they wanted to do with Muslims: kill them.

The top Google search in California at the time was “kill Muslims” with about the same frequency that they searched for “martini recipe,” “migraine symptoms,” and “Cowboys roster.” In the days following the attack, for every American concerned with “Islamophobia” another was searching for “kill Muslems.” Hate searches were approximately 20% of all searches before the attack, more than half of all search volume about Muslims became hateful in the hours that followed it.

These search data can inform us how difficult it can be to calm the rage. Four days after the shooting, then-president Obama gave a prime-time address to the country. He wanted to reassure Americans that the government could both stop terrorism and, perhaps more important, quiet the dangerous Islamophobia.

Obama spoke of the importance of inclusion and tolerance in powerful and moving rhetoric. The Los Angeles Times praised Obama for “[warning] against allowing fear to cloud our judgment.” The New York times called the speech both “tough” and “calming.” The website Think Progress praised it as “a necessary tool of good governance, geared towards saving the lives of Muslim Americans.” Obama’s speech was judged a major success.

But was it? Google search data did not support such a conclusion. Seth examined the data together with Evan Soltas. In the speech the president said, “It is the responsibility of all American—of every faith—to reject discrimination.” But searches calling Muslims “terrorists,” “bad,” “violent,” and “evil” doubled during and shortly after the speech. President Obama also said, “It is our responsibility to reject religious tests on who we admit into this country.” But negative searches about Syrian refugees, a mostly Muslim group then desperately looking for a safe haven, rose 60%, while searches asking how to help Syrian refugees dropped 35%. Obama asked Americans to “not forget that freedom is more powerful than fear.” Still searches for “kill Muslims” tripled during the speech. Just about every negative search Seth and Soltas could think to test regarding Muslims shot up during and after Obama’s speech, and just about every positive search hey could think to test declined.

So instead to calming the angry mob, as people thought he was doing, the internet data told us that Obama actually inflamed it. Seth writes, “Things that we think are working can have the exact opposite effect from the one we expect. Sometimes we need internet data to correct our instinct to pat ourselves on the back.”

So what can be done to quell this particular form of hatred so virulent in America? We’ll try to address this in the next post.

Implicit Versus Explicit Prejudice

August 30, 2017

This post is based largely on the groundbreaking book by Seth Stephens-Davidowitz “Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who we Really Are.” Any theory of racism has to explain the following puzzle in America: On the one hand, the overwhelming majority of black Americans think they suffer from prejudice—and they have ample evidence of discrimination in police stops, job interviews, and jury decisions. On the other hand, very few white Americans will admit to being racist. The dominant explanation has been that this is due, in large part, to widespread implicit prejudice. According to this theory white Americans may mean well, but they have a subconscious bias, which influences their treatment of black Americans. There is an implicit-association test for such a bias. These tests have consistently shown that it takes most people milliseconds more to associate black faces with positive words such as “good,” than with negative words such as “awful.” For white faces, the pattern is reversed. The small extra time it takes is interpreted as evidence of someone’s implicit prejudice—a prejudice the person may not even be aware of.

There is an alternative explanation for the discrimination that African-Americans feel and whites deny: hidden explicit racism. People might be aware of widespread conscious racism but to which they do not want to confess—especially in a survey. This is what the search data seems to be saying. There is nothing implicit about searching for “n_____ jokes.” It’s hard to imagine that Americans are Googling the word “n_____“ with the same frequency as “migraine and economist” without explicit racism having a major impact on African-Americans. There was no convincing measure of this bias prior to the Google data. Seth uses this measure to see what it explains.

It explains, as was discussed in a previous post, why Obama’s vote totals in 2008 and 2012 were depressed in many regions. It also correlates with the black-white wage gap, as a team of economists recently reported. In other words, the areas Seth found that make the most racist searches underpay black people. When the polling guru Nate Silver looked for the geographic variable that correlated most strongly with support in the 2016 Republican primary for Trump, he found it in the map of racism Seth had developed. That variable was searches for “n_____.”

Scholars have recently put together a state-by-state measure of implicit prejudice agains black people, which enabled Seth to compare the effects of explicit racism, as measured by Google searches, and implicit bias. Using regression analysis, Seth found that, to predict where Obama underperformed, an area’s racist Google searches explained a lot. An area’s performance on implicit-association tests added little.

Seth has found subconscious prejudice may have a more fundamental impact for other groups. He was able to use Google searches to find evidence of implicit prejudice against another segment of the population: young girls.

So, who would be harboring bias against girls? Their parents. Of all Google searches starting “Is my 2-year-old, the most common next word is “gifted.” But this question is not asked equally about young boys and young girls. Parents are two and a half times more likely to ask “Is my son gifted?” than “Is my daughter gifted?” Parents overriding concerns regarding their daughters is anything related to appearance.

The URL above will take you to a number of options for taking and learning about the implicit association test.

The Truth About Your Facebook Friends

August 29, 2017

This post is based largely on the groundbreaking book by Seth Stephens-Davidowitz “Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who we Really Are.” Social media are another source of big data. Seth writes, “The fact is, many Big Data sources, such as Facebook, are often the opposite of digital truth serum.

Just as with surveys, in social media there is no incentive to tell the truth. Much more so than in surveys, there is a large incentive to make yourself look good. After all, your online presence is not anonymous. You are courting an audience and telling your friends, family members, colleagues, acquaintances, and strangers who you are.

To see how biased data pulled from social media can be, consider the relative popularity of the “Atlantic,” a highbrow monthly magazine, versus the “National Enquirer,” a gossipy often-sensational magazine. Both publications have similar average circulations, selling a few hundred thousand copies (The “National Enquirer” is a weekly, so it actually sells more total copies.) There are also a comparable number of Google searches for each magazine.

However, on Facebook, roughly 1.5 million people either like the “Atlantic” or discuss articles from the “Atlantic” on their profiles. Only about 50,000 like the Enquirer or discuss its contents.

Here’s an “Atlantic” versus “National Enquirer” popularity compared by different sources:
Circulation Roughly 1 “Atlantic” for every 1 “National Enquirer”
Google searches 1 “Atlantic” for every 1 “National Enquirer”
Facebook Likes 27 “Atlantic” of every 1 “National Enquirer”

For assessing magazine popularity, circulation data is ground truth. And Facebook data is overwhelmingly biased against the trashy tabloid, making it the worst data for determine what people really like.

Here are some excerpts from the book:
“Facebook is digital brag-to-my friends-about-how-good-my life-is-serum. In Facebook world, the average adult seems to be happily married, vacationing in the Caribbean, and perusing the “Atlantic.” In the real world, a lot of the people are angry, on supermarket checkout lines, peeking at the “National Enquirer”, ignoring phone calls from their spouse, whom them haven’t slept with in years. In Facebook world, family life seems perfect. In the real world, family life is messy. I can be so messy that a small number of people even regret having children. In Facebook world, it seems every young adult is at a cool party Saturday night. In the real world, most are at home alone, binge-watching shows on Netflix. In Facebook world, a girlfriends posts twenty-six happy pictures from her getaway with her boyfriend. In the real world, immediately after posting this, she Googles “my boyfriend won’t have sex with me.”


In summary:
DIGITAL TRUTH                          DIGITAL LIES
Searches                                        Social media posts
Views                                             Social media likes
Clicks                                             Dating profiles

Some Common Ideas Debunked

August 28, 2017

This post is based on the groundbreaking book by Seth Stephens-Davidowitz “Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who we Really Are.”

A common notion is that a major case of racism is economic insecurity and vulnerability. So it is reasonable to expect that when people lose their jobs, racism increases. But neither racist searches nor membership in Stormfront rises when unemployment does.

It is reasonable to think that anxiety is highest in overeducated big cities. A famous stereotype is the urban neurotic. However, Google searches reflecting anxiety—such as “anxiety symptoms” or “anxiety help” tend to be higher in places with lower levels of education, lower median incomes, and where a larger portion of the population lives in rural areas. There are higher search rates for anxiety in rural upstate New York than in New York City.

It is reasonable to think that a terrorist attack that kills dozens or hundreds of people would automatically be followed by massive, widespread anxiety. After all, terrorism, by definition, is supposed to instill a sense of terror. Seth looked for Google searches reflecting anxiety. He tested how much these searches rose in a country in the days, weeks, and months following every major European or American terrorist attack since 2004. So, on average, how much did anxiety-related searches rise? They didn’t. At all.

Humor as long been thought of as a way to cope with frustrations, the pain, the inevitable disappointments of life. Charlie Chaplin said, “laughter is the tonic, the relief, the surcease from pain.” Yet, searches for jokes are lowest on Mondays, they day when people report they are most unhappy. They are lowest on cloudy and rainy days. And they plummet after a major tragedy, such as when two bombs killed three and injured hundreds during the 2013 Boston Marathon. Actually people are more likely to look for jokes when things are going well in life than when they aren’t.

Seth argues that the bigness part of big data is overrated. He writes that the smartest Big Data companies are often cutting down their data. Major decisions at Google are based on only a tiny sampling of all their data. Seth continues, “You don’t always need a ton of data to find important insights. You need the right data. A major reason that Google searches are so valuable is not that there are so many of them; it is that people are so honest in them.

Every Body Lies

August 27, 2017

“Everybody Lies” is the title of a groundbreaking book by Seth Stephens-Davidowitz on how to effectively exploit big data. The subtitle to this book is “Big Data, New Data, and What the Internet Reveals About Who We Really are.” The title is a tad overblown as we always need to have doubts about data and data analysis. However, it is fair to say that the internet currently does the best job at revealing who we really are.

The problem with surveys and interviews is that there is a bias to make ourselves look better than we really are. Indeed, we should be aware that we fool ourselves and that we can think we are responding honestly when in truth we are protecting our egos.

Stephens-Davodowitz uses Google trends as his principle research tool and has found that people reveal more about their true selves in these searches than they do in interviews and surveys. Although the pols erred in predicting that Hilary Clinton would win the presidency, Google searches indicated that Trump would prevail.

Going back to Obama’s first election night, when most of the commentary focused on praise of Obama and acknowledgment of he historic nature of his election, roughly one in every hundred Google searches that included “Obama” also included “kkk” or “n_____.” On election night searches and sign-ups for Stormfont, a white nationalist site with surprisingly high popularity in the United States, were more than ten times higher than normal. In some states there were more searches for “n____- president” than “first black president.” So there was a darkness and hatred that was hiding from the traditional sources but was quite apparent in the searches that people made.

These Google searches also revealed that a much of what we thought about the location of racism was wrong. Surveys and conventional wisdom placed modern racism predominantly in the South and mostly among Republicans. However, the places with the highest racist search rates included upstate New York, western Pennsylvania, eastern Ohio, industrial Michigan and rural Illinois, along with West Virginia, southern Louisiana, and Mississippi. The Google search data suggested that the true divide was not South versus North, but East versus West. Moreover racism was not limited to Republicans. Racist searches were no higher in places with a high percentage of Republicans than in places with a high percentage of Democrats. These Google searches helped draw a new map of racism in the United States. Seth notes that Republicans in the South may be more likely to admit racism, but plenty of Democrats in the North have similar attitudes. This map proved to be quite significant in explaining the political success of Trump.

In 2012 Seth used this map of racism to reevaluate exactly the role that Obama’s race played. In parts of the country with a high number of racist searches, Obama did substantially worse than John Kerry, the white presidential candidate, had four years earlier. This relationship was not explained by an other factor about these ares, including educational levels, age, church attendance, or gun ownership. Racist searches did not predict poor performance for any Democratic candidate other than Obama. Moreover these results implied a large effect. Obama lost roughly 4% points nationwide just from explicit racism. Seth notes that favorable conditions existed for Obama’s elections. The Google trends data indicated the there were enough racists to help win a primary or tip a general election in a year not so favorable for Democrats.

During the general election there were clues in Google trends that the electorate might be a favorable one for Trump. Black Americans told polls they would turn out in large numbers to oppose Trump. However Google searches for information on voting in heavily black areas were way down. On election day, Clinton was hurt by low black turnout. There were more searches for “Trump Clinton” than for “Clinton Trump” in key states in the Midwest that Clinton was expected to win. Previous research has indicated that the first name in search pairs like this is likely the favored candidate.

The final two paragraphs in this post are taken directly from Seth’s book.

“But the major clue, I would argue, that Trump might prove a successful candidate—in the primaries, to begin with—was all that secret racism that my Obama study had uncovered, The Google searches revealed a darkness and hatred among a meaningful number of Americans that pundits, for many years, had missed. Search data revealed that we lived in a very different society from the one academics and journalists, relying on polls, thought that we lived in. It revealed a nasty, scary, and widespread rage that was waiting for a candidate to give voice to it.

People frequently lie—to themselves and to others. In 2008, Americans told surveys that they no longer cared about race. Eight years later, they elected as president Donald J. Trump, a man who retweeted a false claim that black people were responsible for the majority of murders of white American, defended his supporter for roughing up a Black Lives Matter protestor at one of his rallies, and hesitated in repudiating support from a former leader of the Ku Klux Klan (HM feels compelled to note that Trump has not renounced the latest endorsement by the leader of the Ku Klux Klan). The same hidden racism that hurt Barack Obama helped Donald Trump.