Big Data back in 1995 - 2017-07-25

My eye was caught by this excerpt from Inc Magazine article of the rise and fall of "Thinking Machines"

With the country in a recession, businesses needed every competitive advantage they could get, which meant knowing their customers' preferences and buying habits in intimate detail. They had begun to collect all conceivable data and were feeding them into their mainframes, looking for any insight that would help them maximize profits.

This was 1995 but sounds like Big Data pitches today.

https://www.inc.com/magazine/19950915/2622.html

Flawed Studies of Social Contagion - 2017-07-22

Famous Christakis and Fowler studies (2007, 2008) turns out to have serious methodological flaws. See:

  • Christakis NA, Fowler JH. The spread of obesity in a large social network over 32 years. New England Journal of Medicine. 2007;357:370–379.
  • Christakis NA, Fowler JH. The collective dynamics of smoking in a large social network. New England Journal of Medicine. 2008;358:2249–2258.
  • Lyons R. The spread of evidence-poor medicine via flawed social-network analyses. Statistics, Politics and Policy Article. 2011;2(1):Article 2, 1–26. https://arxiv.org/abs/1007.2876
  • Noel H, Nyhan B. The ’unfriending’ problem: the consequences of homophily in friendship retention for causal estimates of social influence. Social Networks. 2011;33:211–218.
  • Shalizi CR, Thomas AC. Homophily and contagion are generically confounded in observational social network studies. Sociological Methods and Research. 2011;40:211–239. https://arxiv.org/abs/1004.4704

Interesting excerpts from Lyons paper (https://arxiv.org/pdf/1007.2876.pdf):

A technical note: C&F are comparing coefficients from different mod els. Therefore, they cannot estimate the difference between these coefficients. They would be able to mak e an inference on the difference of two coefficients if they had a valid model that contained both coefficients . We don’t know such a model and, for the general reasons discussed in Section 7, we are skeptical that one exists. Putting such skepticism aside, if we wished to construct such a model, we would need access to the data. The social network data for the Framingham Heart Study was assembled by C&F from hand-written d ata, but C&F have not made this available to others. This also prevents the most basic type of replica tion [ 12 ] and can keep errors hidden [13]. In any case, given what C&F have, they do not have reason to infer that these differences are statistically significant. [emphasis added]

What I found interesting was the experience of the major critiquer Lyons with his 2011 paper (expanded in section 6 of his full paper) - http://newsinfo.iu.edu/news-archive/19381.html

Lyons said his critique has not only brought to light problems with well-publicized studies related to human health, but it has also allowed him the opportunity to voice a broader criticism of how statistical modeling is misused, of the role of peer review in academia, and about the missing place for critique in scientific literature.

Both of the leading, prestigious journals that published research by Christakis and Fowler – the New England Journal of Medicine and BMJ (formerly British Medical Journal) – rejected Lyons' critique, the first declining to give a reason and the second saying the work would be better placed in a specialist journal. Rejections then came from three other leading journals on the grounds that they had not published the original research. A statistics review journal rejected Lyons' paper on the basis that the original research of Christakis and Fowler was itself not sufficiently important.

As for the status of statistical modeling, how it is reviewed in journals, and its present state in academia, Lyons cites a 1998 account from Doug Altman, the current senior statistics editor at BMJ, to make the point: "The main reason for the plethora of statistical errors is that the majority of statistical analyses are performed by people with an inadequate understanding of statistical methods. They are then peer reviewed by people who are generally no more knowledgeable. Sadly, much research may benefit researchers rather more than patients, especially when it is carried out primarily as a ridiculous career necessity." Lyons' paper also cites prominent social scientists to make the same points regarding their fields.

If you want some interesting reading on this (read a while ago but just re-perused today) then Andrew Gelman's blog and comments on the "replication crisis" are well worth it e.g.:

http://andrewgelman.com/2016/09/21/what-has-happened-down-here-is-the-winds-have-changed/