Posts Tagged ‘big data’

These Posts Only Scratched the Surface

September 5, 2017

Of the groundbreaking book by Seth Stephens-Davidowitz “Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who We Really Are, ” the preceding posts have only scratched the surface. The adjective groundbreaking is appropriate as this book opens up a new and very valuable source of data, internet searches. These searches bypass most of our defenses and provide a more accurate view of the person making the searches. Seth describes not only how words are used as data, but also how bodies and pictures are used as data.

One section is titled Digital Truth Serum. In addition to Hate and Prejudice, and the Internet itself it covers the truth about customers, child abuse, abortion, and sex. HM expects that this book will become a best seller primarily for its truth about these very sensitive topics. Much of this true content is depressing and the author asks, “Can We Handle the Truth?”

A section titled Zooming In discusses
What’s Really Going on in Our Counties, Cities, and Towns?
How We fill Our Minutes and Hours
Our Doppelgängers
Seth tells stories using data.

A section titled All the World’s a Lab discusses the techniques Google and other companies use to test and evaluate their presentations. It also discusses what Seth terms Natures Cruel—but Enlightening Experiments.

The last part of the book is titled: BIG DATA HANDLE WITH CARE.
Here he discusses what Big Data Can and Cannot Do that includes The Curse of Dimensionality and The Overemphasis on What is Measurable. Although the discussion is technical, it should be accessible to most readers.

The penultimate chapter discusses two dangers:
The Danger of Empowered Corporations
The Danger of empowered Governments


The Truth About Your Facebook Friends

August 29, 2017

This post is based largely on the groundbreaking book by Seth Stephens-Davidowitz “Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who we Really Are.” Social media are another source of big data. Seth writes, “The fact is, many Big Data sources, such as Facebook, are often the opposite of digital truth serum.

Just as with surveys, in social media there is no incentive to tell the truth. Much more so than in surveys, there is a large incentive to make yourself look good. After all, your online presence is not anonymous. You are courting an audience and telling your friends, family members, colleagues, acquaintances, and strangers who you are.

To see how biased data pulled from social media can be, consider the relative popularity of the “Atlantic,” a highbrow monthly magazine, versus the “National Enquirer,” a gossipy often-sensational magazine. Both publications have similar average circulations, selling a few hundred thousand copies (The “National Enquirer” is a weekly, so it actually sells more total copies.) There are also a comparable number of Google searches for each magazine.

However, on Facebook, roughly 1.5 million people either like the “Atlantic” or discuss articles from the “Atlantic” on their profiles. Only about 50,000 like the Enquirer or discuss its contents.

Here’s an “Atlantic” versus “National Enquirer” popularity compared by different sources:
Circulation Roughly 1 “Atlantic” for every 1 “National Enquirer”
Google searches 1 “Atlantic” for every 1 “National Enquirer”
Facebook Likes 27 “Atlantic” of every 1 “National Enquirer”

For assessing magazine popularity, circulation data is ground truth. And Facebook data is overwhelmingly biased against the trashy tabloid, making it the worst data for determine what people really like.

Here are some excerpts from the book:
“Facebook is digital brag-to-my friends-about-how-good-my life-is-serum. In Facebook world, the average adult seems to be happily married, vacationing in the Caribbean, and perusing the “Atlantic.” In the real world, a lot of the people are angry, on supermarket checkout lines, peeking at the “National Enquirer”, ignoring phone calls from their spouse, whom them haven’t slept with in years. In Facebook world, family life seems perfect. In the real world, family life is messy. I can be so messy that a small number of people even regret having children. In Facebook world, it seems every young adult is at a cool party Saturday night. In the real world, most are at home alone, binge-watching shows on Netflix. In Facebook world, a girlfriends posts twenty-six happy pictures from her getaway with her boyfriend. In the real world, immediately after posting this, she Googles “my boyfriend won’t have sex with me.”


In summary:
DIGITAL TRUTH                          DIGITAL LIES
Searches                                        Social media posts
Views                                             Social media likes
Clicks                                             Dating profiles

Some Common Ideas Debunked

August 28, 2017

This post is based on the groundbreaking book by Seth Stephens-Davidowitz “Everybody Lies: Big Data, New Data, and What the Internet Reveals About Who we Really Are.”

A common notion is that a major case of racism is economic insecurity and vulnerability. So it is reasonable to expect that when people lose their jobs, racism increases. But neither racist searches nor membership in Stormfront rises when unemployment does.

It is reasonable to think that anxiety is highest in overeducated big cities. A famous stereotype is the urban neurotic. However, Google searches reflecting anxiety—such as “anxiety symptoms” or “anxiety help” tend to be higher in places with lower levels of education, lower median incomes, and where a larger portion of the population lives in rural areas. There are higher search rates for anxiety in rural upstate New York than in New York City.

It is reasonable to think that a terrorist attack that kills dozens or hundreds of people would automatically be followed by massive, widespread anxiety. After all, terrorism, by definition, is supposed to instill a sense of terror. Seth looked for Google searches reflecting anxiety. He tested how much these searches rose in a country in the days, weeks, and months following every major European or American terrorist attack since 2004. So, on average, how much did anxiety-related searches rise? They didn’t. At all.

Humor as long been thought of as a way to cope with frustrations, the pain, the inevitable disappointments of life. Charlie Chaplin said, “laughter is the tonic, the relief, the surcease from pain.” Yet, searches for jokes are lowest on Mondays, they day when people report they are most unhappy. They are lowest on cloudy and rainy days. And they plummet after a major tragedy, such as when two bombs killed three and injured hundreds during the 2013 Boston Marathon. Actually people are more likely to look for jokes when things are going well in life than when they aren’t.

Seth argues that the bigness part of big data is overrated. He writes that the smartest Big Data companies are often cutting down their data. Major decisions at Google are based on only a tiny sampling of all their data. Seth continues, “You don’t always need a ton of data to find important insights. You need the right data. A major reason that Google searches are so valuable is not that there are so many of them; it is that people are so honest in them.

Thinking 2.0

March 9, 2016

This  post was inspired by an article in the February 26, 2016 edition of the “New Scientist” written by Michael Brooks.  The title of the article is “A new kind of logic:  How to upgrade the way we think.”    There are many healthymemoy blog posts about the limitations of our cognitive processes.  First of all, are attentional capacity is quite limited and requires selection.  Our working memory capacity is around 5 or fewer items.  There are healthy memory blog posts on cognitive misers and cognitive spendthrifts.  Thought requires cognitive effort that we are often reluctant to spend making us cognitive misers.  And there are limits to the amount of cognitive effort we can expend.  Cognitive effort spent unwisely can be costly.

Let me elaborate on the last statement with some personal anecdotes.  Ohio State was on the quarter system when I attended and my initial goal was to begin college right after graduation in the summer quarter and to attend quarter consecutively so that I would graduate within three years.  Matters when fairly well until my second quarter when I earned the only “D” in my life.  Although I did get one “A” it was in a course for which I had already read the textbook in high school.  I replaced and continued to attend consecutive quarters, but only part time during he summer.  I was in the honors program and managed to graduate in 3.5 years with a Bachelor’s of Arts with Distinction in Psychology.  I tried going directly into graduate studies, but found that I had already expended my remaining cognitive capital.  So I entered the Army to give my mind a rest.

When I returned and began graduate school I was a cognitive spendthrift who wanted to learn as much as I could in my field.  However, I found that I could not work long hours.  If I did my brain turned to mush and I was on the verge of drooling.  So I found it profitable to stop my cognitive spendthrift days and marshal my cognitive resources. It worked and I earned my doctorate psychology from the University of Utah.

Michael Brooks argues that we are stuck in Thinking 1.0.   He mentions that our conventional economic models bear no resemblance to the real world.  We’ve had unpredicted financial crises because of incorrect rational economic models.  This point has been  made many times in the healthy memory blog.  Behavioral economics should address these shortcomings, but it is still in an early stage of development.

Ioannidis’s article has convinced  statisticians and epidemiologists that more than half of scientific papers reach flawed conclusions especially in medical science, neuroscience and psychology.

Currently we do have big data, machine learning, neural nets, and, of course, the Jeopardy champion Watson.  Although these systems provide answers, they do not provide explanations as to how they arrived at the answers.  And there are statistical relations in which it is difficult to determine causality, that is, what causes what.

Michael Brooks argues that Thinking 2.0 is needed.  Quantum logic makes the distinction between cause and effect (one thing influencing another) and common cause (two things responding to the same effect).  The University of Pittsburgh opened the Center for Causal Discovery ( in 2014.

Judea Pearl, a computer scientist and philosopher at UCLA (and the father of the tragically slain journalist Daniel Pearl) says “You simply cannot grasp causal relationships with statistical language.”  Judea Perl has done some outstanding mathematics and has developed software that has made intractable AI programs tractable and has provided for distinguishing  cause and effect.  Unlike neural nets, machine learning, and Watson, it provides the logic, 2.0 logic I believe, as to reasoning behind the conclusions or actions.

It is clear that Thinking 2.0 will require computers.  But let us hope that humans will understand and be able to develop narratives from their output.  If we just get answers from machine oracles will we still be thinking in 2.0

© Douglas Griffith and, 2016. Unauthorized use and/or duplication of this material without express and written permission from this blog’s author and/or owner is strictly prohibited. Excerpts and links may be used, provided that full and clear credit is given to Douglas Griffith and with appropriate and specific direction to the original content.