The danger of exposure to too much stuff

Posted on March 2, 2012


So how do we know which exposures are harmful for humans? Obviously, if we, generally by accident, get exposed to extremely high concentrations of some toxic substance and die immediately, or within a very short period after, it is fairly easy to deduce a causal relation. But how about much lower doses, or when the effects will only show up after a long time period (such as for example for most cancers)?

A pretty straightforward method is to take a living being, put them in a close environment, chuck the substance of interest in that same environment (or administer this to the animal if external exposures are of interest), and see what happens. For direct conclusions using human beings would be best, but for ethical reasons (although issues of time are also important for diseases with a long latency) it is generally considered more appropriate to use animals. I am not talking about the famous randomized controlled clinical trials here,  that generally used for studies on medication that is supposed to be, at the very least, not harmful, but to studies of harmful or toxic substances. It’s a good idea to pick an animal that, given the mechanism or organ of interest, is not too different from humans to be able to conclude something useful about the toxicological effects on ourselves; for various reasons this translates to monkeys, dogs, mice, and some other animals.  And of course, using statistics to derive conclusions, we need a bunch of these animals.

So now we have a reasonable idea about the effect of one substance on humans, but what if, like in normal daily life, we get exposed to many different exposures: some of them toxic, some not, some carcinogenic, some not, some not being dangerous to humans, except in combination with another, etcetera. Life is complicated….

A well-known example of this is tobacco smoke. Which, according to these guys (link) contains “4,000 chemicals, including 43 known cancer-causing (carcinogenic) compounds and 400 other toxins. These include nicotine, tar, and carbon monoxide, as well as formaldehyde, ammonia, hydrogen cyanide, arsenic, and DDT.”  Now of course we could use the same experiment (and this has been done) and just chuck the whole mixture in the closed environment. But then, what causes it? How big is the contribution of the other chemicals? Do some chemicals enhance others’ effects, or maybe they cancel it out? Does this mixture have the same effect on humans as on that animal species we used? Moreover, what dose should we use (would straightforward scaling-down work? Or would the animals simply OD?)? Mmmm, so how long would we wait for effects to show up? What effects by the way, are we looking for? So yes, even experimental, “controlled” life is complicated….

 A natural “experiment” to obtain direct information of exposures and/or specific mixtures would alternatively be to observe what happens in daily life. That way you’d look specifically in humans, have the relevant exposure levels and durations you’d be interested in, and you don’t need to decide on the relevant mixture of substances prior to “experiment”, since people themselves would make that for you. A good example if you’d be interested in the effects of tobacco smoke would be a present-day Dutch pub (see previous article on this blog “Christmas Carols and Tobacco”. Of course this methodology, observational epidemiology, has problems of itself as well, such as for example that you don’t control the dose (so you have no idea whether you measure all exposure, or whether you’ve missed some important ones). You can’t observe people for 25 years to see who develops for example cancer, you’ll have to estimate exposure prior to the period you started observing, and…oh yes…people lie (especially when asked about “bad” behaviour such as smoking and drinking habits)…

Now here is the interesting bit. If we observe a group of people who are exposed to a large number of substances at the same time how do we get the individual risk estimates for all substances in this mixture?

This may sound like an easy, and maybe irrelevant, question, but for illustration let’s take the example of outdoor air pollution. Air pollution is a mixture of a large number of substances from sources like traffic emissions, industry and many others (link). Those often measured include particulate matter (for those interested PM10, PM2.5, and nowadays also ultrafine particles), nitrogen oxides (NOx), carbon monoxide (CO), carbon dixide (CO2), sulphur dioxide (SOx), volatile organic compounds (VOC), and many others. Since these share sources, generally speaking if you get exposed to one of these substances changes are you’re simultaneously exposed to others. Also, if you flee city life and prefer to relax at the seaside or in a forest exposure to SOx will be much less and so will most of the others. Oh wait, but there will be other exposures related to other sources. In statistical terms these exposures are correlated (link). And that’s a problem. So let me show you an example of some simulation work I did (originally for a presentation) to illustrate what could go wrong.

As a basis for this work I have used a study published in 2010 that looked at risk factors for lung cancer in never-smokers (Brenner, Hung, Tsao, et al. BMC Cancer 2010; 10:285 (link)), although the simulated data below is based on the total population in their paper, not just the non-smokers.

Let me first point out that there is nothing special about this study. It is just a study looking at different exposures using a case-control design that I found interesting and just happened to be reading when I needed a real study as a basis for the simulation work. Then, I made a mistake: the study has 445 cases and 948 controls (that’s 1,393 people in total), but I accidentally simulated 1,393 cases and the same number of controls. So essentially, the study is a lot bigger now. Ah yes, for simplicity I also assumed all controls were equal, while in fact some were more equal than others (they used population controls and hospital controls).

Now, they found the following risks for lung cancer in the total population (expressed as Odds Ratios; if you are not familiar with these have a look here (link)) for a number of different exposures: asbestos (OR~1.1 [95% Confidence Interval 0.6-2.0]), solvents, paints and thinners (1.6 [1.2-2.3]), welding equipment (1.7 [1.0-3.0]), pesticides (1.6 [0.8-3.1]), grain elevator dust (1.1 [0.5-2.4]), wood dust (1.5 [1.0-2.4]) and smoke-soot, exhaust (1.7 [1.2-2.5]). These Odds Ratios were all estimated by analyzing them separately, while taking into account other factors that could influence these calculations (adjusting for confounders in correct lingo): pack-years of smoking, age, gender, education and ethnicity.

So in summary I simulated a case-control study of 1,393 cases and the same number of controls, using the risk estimates described above and also simulated that the prevalence of each exposure was similar to that reported in the paper. But remember the part about correlated exposures? Correlations can range from 0 (no correlation) to 1 (100% correlation) and because it is a simulation, I could essentially choose any value I wanted to see what happens. So I decided to be boring (careful) and assumed only a relatively low correlation of 0.30 between all these exposures. Except of course, for some exceptions…

Let’s assume (make up really, but in a realistic kinda way) that exposure to asbestos and welding equipment, solvents and wood dust, and pesticides and grain elevator dusts were instead moderately correlated (0.60 instead of 0.30). Essentially, the assumption is here is that many people who are exposed to solvents are also exposed wood dust, and similar for the other combinations. Mind you, not all people, just a few more (otherwise the correlation would have been 1). In addition, similar to the main findings of the paper, if you were a smoker the risk of exposure to solvents, welding equipment and smoke-soot would have less effect than when you were a non-smoker. So, in simulation terms, there was an interaction between smoking and the effect of exposures with an OR of 0.5 (smoking prevalence was about 24% based on UK Office of National Statistics data). The risk of developing lung cancer when you were a smoker however, was 15 times higher than when you were a non smoker (OR~15.7 to be more precise (Simonato et al. Int J Cancer 2001;91:876-87 (link)).

Oh yes, something that wasn’t in the study but I added for the fun of it; I added cellphone use (about 60% of my population used one, which was based on data from the INTERPHONE study) and, not surprisingly, this was not related to lung cancer (OR~1.0). I did make up however, that if you were exposed to one of the other occupational exposures AND you used cellphone then you’re risk of developing lung cancer would be 50% higher than when you were only exposed to the chemical exposure. This is not uncommon and could also be another chemical or biological factor instead of my cellphone use.

I know, it’s complicated. But I got a bit carried away with exposure mixtures. In summary, what we should find is similar ORs to  those reported in the original study, a very high increased risk for smokers, and no effect for cellphone use.  All in all, I realize you have to take my word for it, but this is not a very uncommon situation…

So if this was the true situation, this is the authors would have found by adding each exposure separately to their statistical analyses:

Point risk estimates (Odds Ratios) and 95% Confidence Intervals

Now that’s not really what we put in the simulation in the first place!

As you can see, all factors are suddenly statistically significant risk factors for lung cancer, while we should not have found anything for pesticides, grain elevator dust, and probably nothing for welding equipment and wood dust. Moreover, cell phone use is suddenly the most important risk factor for lung cancer in this study!

In a not-unlikely “real simulated” environment, we suddenly end up with the wrong conclusions!

 Analyzing all exposures at the same time using multiple regression techniques may improve things, but if correlations are sufficiently high this goes wrong as well. For example, in the case of this simulation the authors would have surprisingly been confronted with asbestos exposure being a protective factor for lung cancer. Now there’s a tricky argument…

Of course statisticians have developed various methods that, too some extent, can deal with correlated variables. That’s too much detail for now, so for now I hope I have made you aware of the dangers of just analyzing exposure data in epidemiological studies without thinking it. I suspect I will, at a later date, work from here and show some of the possible improved methods (not solutions I hasten to add)…

As a final note, can I remind you (again) that this is a simulation and not real data from the study, so in no way does this say anything about the particular study, its results, or its interpretation…