I understand that pre-releasing papers to journalists raises interest in the research work and allows reporters to hit the ground running. But does it also hurt science? In my case, I think the answer may be “yes”.
Others have discussed how embargoed papers affects science journalism (Ed Yong has a good write-up here), but my question is whether the research process itself might suffer - at least for some types of papers: In my case, I wrote a research paper that discussed methodological issues in a previous study and suggested that their study failed to control for possible confounders in an appropriate way. I also suggested a number of methods and analyses that might help to address these shortcomings. As the press received the embargoed paper, some of them called the original researchers and told them some guy claimed their results were wrong, and what did they have to say about that? Rather than reflect on my reasoning and suggestions, this forced the original researchers to react in step with the news cycle: Within something like 24 hrs of the time they first saw my study, they released a statement to the press (available here) where they brushed off my points with reference to some new analyses. In my opinion, the new analyses they referred to were both insufficient to address my points and difficult to assess since they were presented with no details. Some of them, such as the claim that average IQ change was zero within each of three SES levels they constructed, were quite interesting and merit closer review. Other points, such as the claim that there was a relationship even within mid-SES individuals (they didn’t report whether the effects were the same or smaller) have more limited relevance (see below). However, it seemed (at the time) urgent and important to respond before the journalists “went to press,” and I ended up writing a hastily written reply so that I had a response I could make available to the journalists. This - it seems to me - is not conducive to good scientific dialogue. Not only should it be possible to breathe and think about things before pressing “send,” but the discussion can easily veer off into the issues of concern to journalists rather than the more important methodological issues that should be of concern.
Before continuing, let me be clear that my point is not to criticize the science journalists: It is natural (and correct) for them to ask the original authors for a response, and several of the reporters I was in touch with pleasantly surprised me with their level of detail, intellectual curiousness and incisive questions. To some extent it may well be that the embargo just exacerbates the issue, and that the main “problem” (or challenge) is the massive media interest and the quick-response demands that this creates. I don’t have any clear conclusions as to how this could be improved, but I want to note how this may have affected the research debate on IQ and cannabis use.
The various claims flying around are reported in a number of news articles, now that the press embargo has lifted (just google: meier rogeberg cannabis). The response from the original research team has been made available on-line by two of the original researchers, and the lead author of the original study has used it as the basis for an online piece stating that I am flat-out just wrong. There is thus a risk that the researchers have painted themselves into a corner psychologically: By defending their original claim and methodology rather than being open to a proper re-examination of the evidence, it has become more difficult for them to do a fair analysis later without losing face if their original effect estimates were exaggerated or turn out to be non-robust.
I find this a bit disappointing, as well as sad. If the original conclusions were correct, they would hold up in the new analyses I proposed - leaving their conclusions all the more strong as a result. If their effect was overestimated (due to confounding) or even negligible or zero after better controls, surely that should be seen as a positive outcome as well: More important than what results we get is, after all, making sure that our results are as correct and credible as we can make them.
To explain why this matters, let me try to get the important methodological issues across in a clear way to those who are interested: Basically, the original paper (which is available here) used a simple variant of a difference-in-differences analysis. The researchers sorted people into groups according to whether or not they had used cannabis and according to the number of times they had been scored as dependent. They then compared IQ-changes between age 13 and 38 across these groups, and found that IQ declined more in the groups with heavier cannabis-exposure. The effect seemed to be driven by adolescent-onset smokers, and it seemed to persist after they quit smoking.
The data used for this study was stunning: Participants in the Dunedin Study, a group of roughly 1000 individuals born within 12 months of one another in the city of Dunedin in New Zealand, had been followed from birth to age 38. They had been measured regularly and scored on a number of dimensions through interviews, IQ tests, teacher and parent interviews, blood-samples etc, and are probably amongst the most intensively researched people on the planet: The study website states that roughly 1100 publications have been based on the sample so far, which is more than one publication by participant on average ;)
Despite this impressive data, there were some things I found wanting in the analysis. My own experience with difference in differences methods comes from empirical labor economics, and this experience had led me to expect a number of robustness checks and supporting analyses that this article lacked. This is not surprising: Different disciplines can face similar methodological issues, yet still develop more or less independently of each other. In such situations, however, there will often be good reasons for “cross-pollination” of practices and methods. For instance, experimental economics owes a large debt to psychology, and the use of randomized field trials in development and labor economics owes a large debt to the use of randomized clinical trials in medicine.
The cannabis-and-IQ analysis basically compares average changes in IQ across groups with different cannabis use patterns. Since we haven’t randomized “cannabis use patterns” over the participants, we have an obvious and important selection issue: The traits or circumstances that caused some people to begin smoking pot early, and that caused some of these to become heavily dependent for a long time, can themselves be associated with (or be) variables that also affect the outcome we are interested in. The central assumption, in other words, is that the groups would have had the same IQ-development if their cannabis use had been similar. Since this is the central assumption required for this method to validly identify an effect of cannabis, it is crucial that the researchers provide evidence sufficient to evaluate the appropriateness of this assumption. To be specific, and to show what kind of things I wanted the researchers to provide, you would want to:
- Establish that the units compared were similar prior to the treatment being studied - e.g., provide a table showing how the different cannabis-exposure groups differed prior to treatment on a number of variables.
- Establish a common trend - Since the identifying assumption is that the groups would have had the same development if they had had the same “treatment”, then clearly the development prior to the treatments should be similar. In the Dunedin study, they measured IQ at a number of ages, and average IQ changes in various periods could be shown for each group of cannabis users.
- Control for different sets of possible confounders. To show that the estimates that are of interest are robust, you would want to show estimates for a number of multivariate regressions that control for increasing numbers (and types) of potential confounders. The stability of the estimated effect and their magnitude can then be assessed, and the danger of confounding better evaluated: What happens if you add risk factors that are associated with poor life outcomes (childhood peer rejection, conduct disorders etc), or if you include measures of education, jailtime, unemployment, etc.? If the effect estimate of cannabis on IQ changes a lot, then this suggests that selection issues are important- and that confounders (both known and unknown) must be taken seriously. Adding important confounders will also help estimation of the effect we are interested in: Since they explain variance within each group (as well as some of the variance between the groups), they help reduce standard errors on the estimates of interest.
- Establish sensitivity of results to methodological choices. Just as we want to know how sensitive our results are to the control variables we add, we also want to know how sensitive they are to the specific methodological choices we have made. In this instance, it would be interesting to allow for pre-existing individual level trends: Assume that people have different linear trends to begin with. To what extent are these differing pre-existing trends shifted in similar ways by later use patterns of cannabis? By adding in earlier IQ-measurements for each individual (which are available from the Dunedin study), such “random growth estimators” would be able to account for any (known or unknown) cause that systematically affected individual trajectories in both pre- and post-treatment periods. Another example is the linear trend variable they use for cannabis exposure, which presumably gives a score of 1 to never users, 2 to users who were never dependent, 3 to those scored as dependent once and so on. This is the variable that they check for significance - and it would be
- Provide other diagnostic analyses, for instance by considering the variance of the outcome variable within each treatment group (how much did IQ change differ within each treatment group?). In this way, we could tell whether we seemed to be dealing with a very clear, uniform effect that affects most individuals equally, or whether it was a very heterogeneous effect whose average value was largely driven by high-impact subgroups.
- Discuss alternative mechanisms. What potential mechanisms can be behind this, and what alternative tests can we develop to distinguish between these? For instance, let us say you identify what seems to be a causal effect of cannabis use and dependency, but its magnitude is strongly reduced (but not eliminated) when you add in various potential confounders. For instance, educational level. As the authors of the original paper note (when education turns out to affect the effect size), education could be a mediating factor in the causal process whereby cannabis affects IQ. However, this would mean that the permanent, neurotoxic effect they are most concerned with would be smaller, because part of the measured effect would be due to the effect of cannabis on education multiplied by the effect of this education on IQ. The evidence thus suggests that the direct “neurotoxic” effect is only part of what is going on. It also suggests that we might want to look for evidence to assess how strongly cannabis use causally affects education, to better understand the determinants of this process. For instance, even if there was only a temporary effect of cannabis on cognition, ongoing smoking would do more poorly in school or college, which might then influence later job prospects and long term IQ. The effect doesn’t even have to be through IQ: If pot smoking makes you less ambitious (either because of stoner subculture or psychological effects), the effect may still have long term consequences by altering educational choices and performance. Put differently: If the mechanism is via school, then even transitory effects of cannabis becomes important when they coincide with the period of education.
It turned out that I could - to some extent. Early onset cannabis use appeared to be correlated with a number of risk factors, and these risk factors were also correlated with poor life outcomes (low and poor education, crime, income etc.). The risk factors were also correlated with socioeconomic status.
The next question was whether these factors could affect IQ. One recent model of IQ (the Flynn-Dickens model) strongly suggested they would. The model sees IQ as a style or habit of thinking - a mental muscle, if you like - which is influenced by the cognitive demands of your recent environment. School, home environment, jobs and even the smartness of your friends are seen as in a feedback loop with IQ: High initial IQ gives you an interest in (and access) to the environments that in turn support and strengthen IQ. Since the risk factors mentioned above would serve to push you away from such cognitively demanding environments, it seemed plausible that they would affect long term IQ negatively by pushing you into poorer environments than your initial IQ would have suggested.
A couple of further parts to this potential mechanism can be noted (both discussed here): It seems that high-SES kids have a higher heritability of IQ than low-SES kids, which researchers often interpret as due to environmental thresholds: If your environment is sufficiently good, variation in your environment will have small effects on your IQ. If, however, your environment is poorer, similar variation will have larger effects. Put differently: The IQ of low-SES kids is more affected by changes to their environment than that of high-SES kids.
Also, there is a (somewhat counterintuitive, at first glance) result which shows that average IQ heritability increases with age. One interpretation of this is that our genetic disposition causes us to self-select or be sorted into specific environments as we age. The environment we end up with is therefore more determined by our genetic heritage than our childhood environment, where our family and school were, in some sense, “forced environments.”
In my research article, I refer to various empirical studies supporting these mechanisms and their effects. For instance, past studies that find SES, jailtime, and education to be associated with the rate of change in cognitive abilities at different ages. Putting these pieces together, the risk factors that make you more likely to take up pot smoking in adolescence, and that raise your risk of becoming dependent, also shift you into poorer environments than your initial IQ would predict in isolation. Additionally, these shifts are more likely for kids in lower-SES groups (since the risk factors are correlated with SES), and these also have an IQ more sensitive to environmental changes. Finally, for the same reason, the forced environment of schooling is likely to raise childhood IQ more for the low SES kids (because it is a larger improvement on their prior environments, and because their IQs are more sensitive to environmental influences). SES, then, is in some sense a summary variable that is related to a number of the relevant factors, in that low SES
- correlates with risk factors that influence, on the one hand, adolescent cannabis use and dependency and, on the other hand, poorer life outcomes, and
- signals a heightened sensitivity to environmental factors (the SES-heritability difference in childhood)
- probably reflects the magnitude of the extra cognitive demands imposed by school relative to home environment
For these reasons, SES seemed like a good variable to use in a mathematical model to capture these relationships. However, it should be obvious from my description of this mechanism that we should expect the mechanism to work even within a socioeconomic group: Even within this group, those with high levels of risk factors will experience poorer life outcomes, which may reduce their IQs. They will also most likely have higher probabilities of beginning cannabis smoking. At the same time, we would expect a smaller effect within a specific socioeconomic group than we would across the whole population.
However, I simplified this by using SES in three levels and created a mathematical model with these effects, using effect sizes drawn from past research literature where I could find it. Using the methods used in the original study, I tested my simulated data and found the statistical methods identified the same type and magnitude of effects here as they had in the actual study data. This, of course, does not prove or establish that there is no effect of cannabis on IQ. What it does is to show that the methods they used were insufficient to rule out other hypotheses, that the original effect estimates may be overestimated, and that we need to look more deeply into the matter, using the kind of robustness checks and specification tests I discussed above.
In my mind, this should be just the normal process of science - an ongoing dialogue between different researchers. We know that replication of results often fail, and that acting on flawed results can have negative consequences (see here for an an interesting popular science account of one such case). A statistical model by medical researcher Ioannides (at the centre of this entertaining profile) suggests that new results based on exploratory epidemiological studies of observational data will be wrong 80% of the time. The Dunedin study on cannabis and IQ would, it seems, fit into this category. After all, by the time you’ve published more than 1100 papers on a group of individuals, it seems relatively safe to say that you have moved into “exploratory” mode.
In light of this, critically assessing results and methods and proposing alternative explanations and further tests should be an everyday and expected part of research work. Such work is particularly important in cases like the Dunedin study, where the data involved is both costly and time consuming to construct, and thus very rare. As noted recently by Gary Marcus in a G+ comment (second comment here), flawed results based on such data is likely to persist for a “really long time” if we are to wait for other researchers to replicate the analyses on other data.
And that, finally, brings us back to the end. I remain hopeful that the original researchers will return to their data and address my methodological points properly: How robust and credible is the effect, and how sensitive is the effect magnitude to different sets of controls and methodological choices. However, I am wary that the pre-release to the press and the quick back-and-forth exchanges and position-taking this seems to have caused have reduced the likelihood of this taking place.