Saturday, January 26, 2013

Perspectives on Inference

            The following paper was submitted by me as a final project for a Hampshire College cognitive science course. It has been edited into a second draft, and has also been edited and formatted significantly from its original version for pulication on this site.

            Numbers never lie is a phrase etched into our collective minds at a young age. Unfortunately this could not be further from the truth, especially in regards to statistics. In fact, one professor went as far to say that statistics is often classified as a subcategory of lying. (Moore 1985). As powerful tools of inference, statistical tests are able to give us insights from the data we collect about the world around us. However, these insights are generalizations and are open into interpretation; just as a translation can only ever be an interpretation, statistical results are also only interpretations of data and variables.
            Before this paper begins, it is important to start with a disclaimer. By no means am I stating that the statistical inferences we have today are the “begin and end all” of data assessment. If anything, they are just another perspective to assimilate data and makes the data easier to “work” with. For instance, the student’s T-test is much more conservative in terms of data analysis than the z test, although there is criterion for using each; even then this criterion can be vague. One textbook I have states that you should use the z-test when you have a sample size under 30 (Diez, 2012). But this is by no means a standardized rule, and can be up to the researcher’s discretion.  
            As a result of this amalgamation of what statistics can mean, different groups of people have different ways of perceiving their intended use and for the layperson, the media, students, and academics, statistics analysis have very different meanings. I would like to explore what these perceptions are. Before that, I would also like to present some ways in which statistics can be altered in malicious ways in order to suit the needs of a researcher. While not the only ways one can be dishonest in academic literate, doing so will demonstrate how easy such feats can be.            
            A very simply way one could change their statistically insignificant results to statistically significant results would be to change their one tailed test to a two tailed test. Doing so would lover the p-value (the probability of getting your result if the null hypothesis is true) of your result, possibly changing the outcome of your study. This is dishonest because you need to start off with a good reason as to why you are using a one or two-tailed test, One–tailed tests are for studies where we know that there will be no sort of adverse result, or studies where we are not interested in that result for a valid reason. For example, if we were testing the effects of meditation on blood pressure, we would not expect meditation to increase a subject’s blood pressure. As a result, your alpha level (where one’s p-value needs to fall in order to reject the null) is only on one side of a distribution and would be five percent. However, if we were to test some sort of new way for students to study, we would want to see if our independant variable increased or decreased subbects' scores Therefore we would 

(Different image used for online article, image from The heritiage foundation.)

split the 0.05 alpha level between the two tails of the distribution, so that it is now 0.025. If we were to run our "new study method" experiment, and at the end we get a p-value of 0.033; if we decided that we wanted to change our study to a one-tail experiment, this would be dishonest.
            One of a more obvious way to misinterpret statistics is to literally change the perspective of the visual representation of your data. In the following example, the samples closest to us would be considered larger, because they appear larger.
A verbal example of the same phenomena would be how you expressed your figure. Would you like to say that you had a one percent return on sales, a fifteen percent return on investment, a ten-million dollar profit, or a sixty percent decrease from last year (Huff, 1954)? This topic alone could be covered in volumes (such as the one just cited), however I am trying to show that there are many different ways one can misrepresent, and misinterpret statistical information. One could write multiple papers as well on each of these subjects (performing logical fallacies, such as asking loaded questions, throwing out data, manipulating data, have biased samples, etcetera). I just wanted to show how one can present data in such a way that it would appear to be justified to an uninformed reader.
            There are many real world examples of studies that misrepresent their data or have had their data misrepresented by others, whether it was an accident or just a simple mistake. Duncan MacDougall’s attempt to find the weight of the human soul in the twentieth century is something that has entered Americans’ minds as a faucet of pop-science. While MacDougall reported that the human soul might have an average weight of about twenty-one grams. The truth is that his methodology was seriously lacking. MacDougall only had a sample size of six, two of which were discarded, with the rest there was trouble with determining the exact time of death. The weight of subjects often fluctuated after death, and contemporaries of the time provided a laundry list of physiologically plausable alternitive reasons as to why the subjects would have lost weight (Evans, 1947). Never the less, MacDougall continued his research and his results found no difference in weight with dying dogs.
            Still today, MacDougall’s research is something I have been told by others with conviction is a scientific fact. I personally have had a high-school history teacher relate this faux factoid to us. “A scientist once measured the human soul” she began. A few students backed her up, they too had heard about the doctor who was able to weigh the human soul and that because he was a scientist, there was some validity to the point being made. While this example is mostly harmless, being confined to everyday “did you know?” situations, misinterpreted studies can become toxic in the realm of news media.
Take the television show Ancient Aliens as another example many should be familiar with. While there is virtually no manipulation of statistics in the show, there is vast falsifying of information and misinterpretation of current archeologist evidence making such absurd claims such as the pyramids were carved out of lasers, and ancient peoples having the ability to build aircrafts (Heiser, 2012). An example more related to Neuroscience and statistics would the studies on average brain size often distributed by ignorant racists to support their beliefs that certain races of people are more intelligent. With the cited papers (an example would be: Witelson, 2005), these racists claim that other races have smaller brains, and then insinuate that this somehow means they must be less intelligent. Besides the fact that the results of the study provided; many others like it are almost always either insignificant or too vague to come to a solid conclusion, it is still a red herring to say that brain size is related to or correlated with intelligence. However, this is not an inference that an average layperson may make, and they could be led to believe that certain bigoted viewpoints have some sort of validity. Here is a different example from May of this year (2012 at the time of submission), CNBC reported a story with the headline “The Inflation of Life, The Cost of Raising a Child Has Soared" (CNBC). The article states that the cost of raising a child has raised 25%. What they don’t state is that this can be reflected in ten years of inflation (Olitsky, 2012).
Students are also prone to these mistakes. Any person who spends some time in a psychology class is eventually bound to hear something along the lines of “Therefore this study proved X”. I am not accusing these students of being malicious, and it would be unfair to hold them to the same standard that we should the media, specifically news outlets and a television channel that claims to have history as the primary subject matter. This may partially be due to the fact that statistics are rarely taught outside of higher education, and even then statistics may not be offered or required outside of certain programs. One may argue as to why a high-school student or English major would need to take statistics, but when statistics are used every day, it is important for a more general understanding of what they mean, and how they are used.
We can take a step further and examine who exactly are teaching these statistics course. The author of the book Sense and Nonsense of Statistical Inference, Chamont Wang, claims that many researchers do not even understand the statistical methods they are applying. He cites a paper that approximated half of the articles published in medical journals at the time used statistical methods incorrectly (Wang, 1993. Glantz, 1980). And at this level of academic research, a major problem is conflict of interest. Many professors are in a situation where they are pressured to publish significant research in order to keep their jobs. Deemed “publish or perish” this environment  is lethal for the integrity of science as it puts many academics in situations where they can easily manipulate their data in order to just keep their jobs, or to continue funding. A simple Google search of “scientific misconduct” will yield thousands of studies that have been deemed invalid or fraudulent in the last decade alone.
In closing, there are an exhausting number of ways to misrepresent data, and what has been presented here is a brief overview that does not even being to scratch the surface of misinterpreting statistics. I would have liked to expanded more and more on each one of these subjects provided, but I don’t know if I would have known when to stop. While this paper has been rather bleak, I would like to note that one of the beautiful things about science is that it is always open for discussion and criticism, and with due time we may hopfully be able to cast out instances of bad research as examples of what is naught to be done.

Works Cited

Ancient Aliens Debunked. Dir. Mike Heiser. N.p., 30 Sept. 2012. Web. 5 Dec. 2012.

Diez, David M., Christopher D. Barr, and Mine Çetinkaya-Rundel. OpenIntro Statistics. Lexington, KY: CreateSpace, 2012. Print.

Evans, Bergen. The Natural History of Nonsense,. New York: A.A. Knopf, 1947. Print.

Glantz, S. A. (1980). Biostatistics: How to Detect, Correct and Prevent Errors in The Medical Literature. Vol. 61, 1-7.

Huff, Darrell, and Irving Geis. How to Lie with Statistics. New York: Norton, 1954. Print.

Introduction to SAS.  UCLA: Statistical Consulting Group. (accessed December 5, 2012).

Moore, David S. Statistics: Concepts and Controversies. San Francisco: W.H. Freeman, 1985. Print.

Olitsky, Morris. "Misuse of Statistics a National Problem. Amstat News, 1 Aug. 2012. Web. 10 Dec. 2012. .

Pie3d. N.d. Photograph. Charting Control. Chartin Control. 2012. Web. 9 Dec. 2012.

"The Inflation of Life - Cost of Raising a Child Has Soared." Yahoo! Finance. CNBC, 7 May 2012. Web. 12 Dec. 2012.

Wang, Chamont. Sense and Nonsense of Statistical Inference: Controversy, Misuse, and Subtlety. New York: Marcel Dekker, 1993. Print.

Witelson, S. F. "Intelligence and Brain Size in 100 Postmortem Brains: Sex, Lateralization and Age Factors." Brain 129.2 (2005): 386-98. Print.

No comments:

Post a Comment