Posted on August 5, 2011


Although we are going back in time to a 2008 publication, I think it deals with an important topic so it may well be justified.  It also seems in that respect that nothing has changed since 2008 so I guess I can get away with it. By coincidence I ran into a paper called “Why Most Discovered True Associations Are Inflated”, published in Epidemiology in 2008 by Professor Ioannidis (link).

It is an interesting topic.

Let’s assume (just for fun really) that what is written in the Daily Mail, the Daily Express, The Sun and other quality newspapers is actually true (Let me re-iterate…just for fun. As a hypothetical thing). Isn’t it amazing that despite all these discoveries, not every single one of us has diseased from [insert any chronic disease] or controversially that [insert any chronic disease] still exists, given the miracle cures that have been discovered to date? Surprisingly, although many have no decent scientific basis at all, quite a number have been based (albeit somewhat loosely interpreted) on actual peer-reviewed publications. So the paper I accidentally hit on systematically goes through the how’s and the why’s. It even comes with quantitative data and simulations (I am a sucker for those)!

Basically, it discusses whether effect sizes reported in epidemiological studies, at the time they are first discovered and published in the scientific literature, accurately reflect the true effect sizes. The answer, you may have guessed is “no”….on average (this is not an attempt to discredit individual work). As discussed in the paper, effect sizes of newly discovered true associations are inherently inflated on average, which can be attributed to how the scientific discovery process is organized. A new discovery, or success so you will, is based on an association passing a certain threshold of statistical significance (often the well-known P<0.05), and often this discovery is based on marginally powered study sizes (because they are new discoveries and hardly anyone nowadays would pay large sums of money to investigate “a gut feeling”). Now this is where the simulations come in (yes!). I am not going through the details but gladly point you to the paper itself if interested. However, what they show is that if the same case-control study would be done a gazillion times (Prof. Ioannidis used and actual number of repeats) we would find, on average, the risk estimate we would expect but with random variation around it. However, since we are interested in finding effects, more often than not non-significant findings will not be reported. When focussing on the case-control studies with significant findings only, it is neatly shown that the odds ratios (ORs) by definition would bias upwards. The amount of inflation depended on the number of cases and controls, and was about 2-fold for studies of 50 people per group in these specific simulations. Or in risk estimate terms, an odds ratio (OR) of 2.73 (IQR 2.60-3.16) was reported where the actual true OR was 1.25.

Now this is just due to random effects generated by the mechanism of scientific discovery. But unfortunately, as usual there are people involved as well. And people, even the ones with proper training, have the amazing ability to screw up most things they get themselves involved in (it’s a gift…). In statistical terms these can be summarized (assuming one did not screw up the study design or data collection in the first place) as “the availability of alternative options in model selection”. Or in other words; there are a large number of ways how to analyze the same data. Depending on the decision made in handling, analyzing and presenting the data different risk estimates will result. A phenomenon named “vibration of effects” by Prof. Ioannidis. This wouldn’t be a problem if all analyses, or a random selection of all analyses would be presented. Alas, being done by people, also being done in an environment where making discoveries is rewarded, you can see where this will go wrong.

Of course, in contrast there may be reasons why someone might not want to “discover” a risk. There are examples enough to be found in environmental and occupational epidemiology (here is an good read on the topic in an occupational context (link)). That would require the same tactics as described above for inflating risk estimates, but well…reading backwards so to speak…

So there is reason to worry here, although an evaluation of how well limitations and caveats are discussed in a paper does provide clues on the likely extent of in- or deflation of true risk estimates.   The trick to solve, or at least minimize, the effects of this is available within the scientific process as well, and is called “replication” – preferably replication by others. Theoretically, after enough replication the inflated effects in early studies will be corrected because other, often better powered studies will be completed.  These can then be combined in a so-called meta-analysis which yields a (random-effects) common underlying risk estimate. By continuously updating these meta-analyses when new studies become available, given enough time and studies the true risk estimate will eventually reveal itself. This is nicely illustrated by the fact that the effect sizes of “positive” (as in showing significant results) meta-analyses are inversely correlated to the amount of evidence accumulated; or, the more studies included, the lower the risk estimate.

I think this is a nice moment for some self-reflection. Prof. Ioannidis present two stances in hunting associations. Have a look and see which one you are…

Aggressive Discover Reflective Replicator
What matters is… Discovery Replication
Databases are… Private goldmines  not to be shared Public commodity
A good epidemiologist… Can think of more exploratory analyses Is robust about design and analysis plan
One should report… What is interesting Everything
Publication mode… Publish each association as a separate paper Publish everything as a single paper
After reporting… Push your findings forward Be critical/cautious
Posted in: General