I am cat-sitting for a friend who is teaching a "how to lie with statistics" course (really: being smart about statistics) and noticed some an interesting article sitting on his table of current reading material. "Why Most Published Research Findings Are False" by John P. A. Ioannidis in the August 2005 issue of the Public Library of Science Medicine online journal. The summary almost speaks for itself:
There is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.
Ioannidis describes a statistical test for the likelihood of research being false, first without researcher bias and then a second test that includes bias. The result: it doesn't look good.
In the body of the article, he lists a number of corollaries that can be deduced from these statistical tests. Some of these are familiar from statistics, but there are a few in here that run counter to commonly-held beliefs about how research works.
- The smaller the studies conducted in a scientific field, the less likely the research findings are to be true.
- The smaller the effect sizes in a scientific field, the less likely the research findings are to be true.
- The greater the number and the lesser the selection of tested relationships in a scientific field, the less likely the research findings are to be true.
- The greater the flexibility in designs, definitions, outcomes, and analytical modes in a scientific field, the less likely the research findings are to be true.
- The greater the financial and other interests and prejudices in a scientific field, the less likely the research findings are to be true.
- The hotter a scientific field (with more scientific teams involved), the less likely the research findings are to be true.
Many of these problems are associated with the way research is conducted, particularly when it comes to bias and the results of studies that involve multiple research groups. One item that pricked my ears was the definition of bias as including post-study manipulation of the design or analysis of the data:
Such manipulation could be done, for example, with serendipitous inclusion or exclusion of certain patients or controls, post hoc subgroup analyses, investigation of genetic contrasts that were not originally specified, changes in the disease or control definitions, and various combinations of selective or distorted reporting of the results. Commercially available "data mining" packages actually are proud of their ability to yield statistically significant results through data dredging.
I've always thought of research bias as being a mistake of omission or of looking at things from the wrong angle. This type of "bias" I have always thought of as plain old wrong when reporting study results. The thing that caught me was the reference to data mining software as being a potential cause of this kind of problem. I've seen people dive into complex data sets with various data mining packages and pull out potentially-interesting correlations. The problem, I think, isn't the data mining itself, it is assuming that the interesting correlation actually holds without going back to the study (or doing a new one) to verify the new hypothesis. This is the kind of software used in "business intelligence" analyses, and this is not to say that BI is wrong. Certainly, it is smart to know that what you discover in such analyses should to be verified.