Friday, March 8, 2013

Cannabis, IQ and socio-economic status in the Dunedin data - an update

I’ll start with a short recap: Researchers published article august 2012 arguing that adolescent-onset cannabis smoking harms adolescent brains and causes IQ to decline. I responded with an article available here arguing that their methods were insufficient to establish a causal link, and that non-cognitive traits (ambition, self-control, personality, interests etc) would influence risks of adolescent-onset cannabis use while also potentially altering IQs by influencing your education, occupation, choice of peers etc. For various reasons, I argued that this could show up in their data as IQ-trends that differed by socioeconomic status (SES), and suggested a number of analyses that would help clarify whether their effect was biased due to confounding and selection-effects. In a reply this week (gated, I think), the researchers show that there is no systematic IQ-trend difference across three SES groups they’ve constructed. However, as I note in my reply (available here), they still fail to tell us how different the groups of cannabis users (never users, adolescent-onset users with long history of dependence etc) were on other dimensions, and they still fail to control for non-cognitive factors and early childhood experiences in any of the ways I proposed. In fact, none of the data or analyses that my article asked for have been provided, and the researchers conclude with a puzzling claim that randomized clinical trials only show “potential” effects while observational studies are needed to show “whether cannabis actually is impairing cognition in the real world and how much.” 

In light of the response, it seems I have made a very poor job of communicating my point. The researchers reduce my entire argument to a temporary effect of schooling on low-SES children, so let me try one last (?) time:


Are you strongly confident that cannabis is the only thing that systematically affects IQ after the age of 13? If so, the original research design may seem OK: Look at IQ trends for those who used a lot of cannabis and compare to IQ trends of those who used little. If nothing else systematically affects IQ, there is no need to know how similar or different these groups are in other ways. It is irrelevant, just as we don’t need to know the color of falling ball to calculate its speed.


However, if you think other things may or are likely to affect IQ after the age of 13, such as education, genes, early childhood experiences, then we need to know more about the people who used a lot and the people who used only a little cannabis or none at all. In my original article I gave several references supporting the claim that IQ-trends are affected by environment. I also noted that past research has found the heritability of IQ to increase with age. A common interpretation of this is that our genes influence our non-cognitive traits. As long as we are young, we usually have to live with our parents and attend the neighborhood school. As we age, our non-cognitive traits have an increasingly strong effect on where we end up - what environment we are in, what friends we have, what activities we participate in etc. Our genes thus influence our future environment, and the cognitive challenges from our environment influence our IQ.


In light of this, it seems (to me) reasonable to ask for information on other differences between their groups. What we know, we have to glean from other research based on the same data. Some of this was referenced in my original article, but to give two simple examples: One of the researchers once described the cannabis-dependent 21-year olds in their data as having

had a long history of anti-social behavior, going right back to when they were three years old. They were being naughty, beating up other kids in the sandpit, being disruptive, then they went to stealing milk money, then they went to beating up bigger kids in the schoolground, then they converted a car … It goes on and on and on [...] When stuff doesn’t work out right they just resort to violence.
More recently, the Dunedin data were used in research that found women with more sexual partners far more likely to become dependent on alcohol or cannabis. Women reporting 2.5 or more partners per year when 18-20 years old, were almost 10 times more likely to be dependent on alcohol or cannabis at 21. My point is that this indicates that the early-onset cannabis users who go on to dependence do differ systematically from those who start later or never use, and that these differences may be related to underlying “non-cognitive traits” that would also affect their lives, environments and thus IQ independently of their cannabis use. 

The Dunedin group apparently see such traits as irrelevant to their argument. At times, they even underplay the numbers they presented in their own article on the subject: In their response to my article, they write that “Many young cannabis users opted out of education, but that did not account for their IQ drop.” However, their original numbers indicated that education substantially affected the size of the cannabis-use effect: The differences between non-users and adolescent-onset cannabis users with long-term dependence was markedly different for people with different educational levels. This lead the authors to write that “among the subset with a high-school education or less, persistent cannabis users experienced greater decline.” As I noted in my article, the magnitude of the “effect” (IQ change of highest-exposure group minus IQ change of no-exposure group) was twice as large for those with high-school or less compared to the same effect for those with more education.
How important you think these selection issues are will of course differ with your prior beliefs about the importance of various IQ-determinants. As long as the Dunedin data remains difficult to access for other researchers, there is little I can do to examine these things myself. I suggested a number of analyses and robustness checks, but the researchers were not interested in pursuing these and reduced my argument to “school temporarily raises low-SES IQ”. 


This misinterpretation of my article’s argument is to some (a large?) extent my own fault: While my article does discuss non-cognitive traits, rising heritability of IQ, and proposes a number of analyses to cope with the complications these raise - I too often use the shorthand “SES” rather than “non-cognitive traits correlated with SES.” It would have been clearer and better if I had first discussed the importance of non-cognitive traits in general, and then introduced the hypothesis that this would show up as differing IQ-trajectories across SES groups. That would have made my alternative causal model (non-cognitive traits have increasing influence over environments as people age, and the environment you end up in influences you IQ) clearer. My bad. I tried to remedy this by running the new 500-word reply in PNAS by a number of colleagues and friends before publishing, rewriting extensively to try and make my causal model clearer while also a) explaining why I thought (wrongly, it now seems) that this would show up as differing IQ-trends for different SES groups, and b) clarifying my more general methodological points and the extent to which they still remain relevant.
While some of the cause for misinterpretation is likely due to my own communicative skills, there is also a difference in methodological attitudes at work: In empirical labor economics, researchers are very concerned with selection effects, and you need to have credible, “plausibly exogenous” variation in causal variables for your effects to be accepted as causal. In contrast, the Dunedin researchers write that randomized clinical trials only show “potential” effects while observational studies are needed to show “whether cannabis actually is impairing cognition in the real world and how much.”


To me, this sounds very odd:. We have several instances of randomized clinical trials contradicting effects identified in a large number of observational studies. Three of the most famous ones are described here (possibly gated): Hormone replacement therapy was thought to reduce female coronary heart disease risk but may actually increase it, beta-carotene seemed to reduce cancer risk in observational studies but actually increased it, and vitamin C had no effect on heart disease risk while observational studies indicated it was protective. Closer to the subject matter at hand, a 2007 meta-review of observational studies in the Lancet indicated a strong causal effect of cannabis use on schizophrenia risk. Some researchers pointed out that since increasing shares of young people had been using cannabis, this implied that the number of UK schizophrenia cases should rise strongly , but this didn’t happen and the importance of the link is now again in doubt.

What all these cases have in common is that there seemed to be convincing evidence from observational studies that there was an effect, but it turned out that the effect was largely due to subtle forms of confounding. The examples certainly do not show that, e.g., beta-carotene has a “potential” negative effect that is “actually” positive in everyday life - though that is what the argument from the Dunedin researchers seems to state. Instead, these cases show that causal inference from observational data is difficult. This is the perspective from which my argument comes. I don’t claim that the correlation observed in the Dunedin data is actually fully accounted for by non-cognitive traits. I argue that they have yet to tell us how groups defined by cannabis use patterns differ on other dimensions, and that they have yet to show us how robust their effect estimates are to “controls for causal back channels unrelated to neurotoxicity, simultaneous inclusion of multiple potential confounders, and changes to their statistical model.”

Tuesday, January 15, 2013

Did pre-release harm the research process? And what is up with this cannabis and IQ research?

Did pre-release of my PNAS paper (here, but gated) on methodological problems with Meier et al’s 2012 paper on cannabis and IQ reduce the chances that it will have its intended effect? In my case, serious methodological issues related to causal inference from non-random observational data became framed as a conflict over conclusions, forcing the original research team to respond rapidly and insufficiently to my concerns, and prompting them to defend their conclusions and original paper in a way that makes a later, more comprehensive reanalysis of their data less likely.

I understand that pre-releasing papers to journalists raises interest in the research work and allows reporters to hit the ground running. But does it also hurt science? In my case, I think the answer may be “yes”.

Others have discussed how embargoed papers affects science journalism (Ed Yong has a good write-up here), but my question is whether the research process itself might suffer - at least for some types of papers: In my case, I wrote a research paper that discussed methodological issues in a previous study and suggested that their study failed to control for possible confounders in an appropriate way. I also suggested a number of methods and analyses that might help to address these shortcomings. As the press received the embargoed paper, some of them called the original researchers and told them some guy claimed their results were wrong, and what did they have to say about that? Rather than reflect on my reasoning and suggestions, this forced the original researchers to react in step with the news cycle: Within something like 24 hrs of the time they first saw my study, they released a statement to the press (available here) where they brushed off my points with reference to some new analyses. In my opinion, the new analyses they referred to were both insufficient to address my points and difficult to assess since they were presented with no details. Some of them, such as the claim that average IQ change was zero within each of three SES levels they constructed, were quite interesting and merit closer review. Other points, such as the claim that there was a relationship even within mid-SES individuals (they didn’t report whether the effects were the same or smaller) have more limited relevance (see below). However, it seemed (at the time) urgent and important to respond before the journalists “went to press,” and I ended up writing a hastily written reply so that I had a response I could make available to the journalists. This - it seems to me - is not conducive to good scientific dialogue. Not only should it be possible to breathe and think about things before pressing “send,” but the discussion can easily veer off into the issues of concern to journalists rather than the more important methodological issues that should be of concern.

Before continuing, let me be clear that my point is not to criticize the science journalists: It is natural (and correct) for them to ask the original authors for a response, and several of the reporters I was in touch with pleasantly surprised me with their level of detail, intellectual curiousness and incisive questions. To some extent it may well be that the embargo just exacerbates the issue, and that the main “problem” (or challenge)  is the massive media interest and the quick-response demands that this creates. I don’t have any clear conclusions as to how this could be improved, but I want to note how this may have affected the research debate on IQ and cannabis use.
 
The various claims flying around are reported in a number of news articles, now that the press embargo has lifted (just google: meier rogeberg cannabis). The response from the original research team has been made available on-line by two of the original researchers, and the lead author of the original study has used it as the basis for an online piece stating that I am flat-out just wrong. There is thus a risk that the researchers have painted themselves into a corner psychologically: By defending their original claim and methodology rather than being open to a proper re-examination of the evidence, it has become more difficult for them to do a fair analysis later without losing face if their original effect estimates were exaggerated or turn out to be non-robust.
 
I find this a bit disappointing, as well as sad. If the original conclusions were correct, they would hold up in the new analyses I proposed - leaving their conclusions all the more strong as a result. If their effect was overestimated (due to confounding) or even negligible or zero after better controls, surely that should be seen as a positive outcome as well: More important than what results we get is, after all, making sure that our results are as correct and credible as we can make them.
 
To explain why this matters, let me try to get the important methodological issues across in a clear way to those who are interested: Basically, the original paper (which is available here) used a simple variant of a difference-in-differences analysis. The researchers sorted people into groups according to whether or not they had used cannabis and according to the number of times they had been scored as dependent. They then compared IQ-changes between age 13 and 38 across these groups, and found that IQ declined more in the groups with heavier cannabis-exposure. The effect seemed to be driven by adolescent-onset smokers, and it seemed to persist after they quit smoking.
 
The data used for this study was stunning: Participants in the Dunedin Study, a group of roughly 1000 individuals born within 12 months of one another in the city of Dunedin in New Zealand, had been followed from birth to age 38. They had been measured regularly and scored on a number of dimensions through interviews, IQ tests, teacher and parent interviews, blood-samples etc, and are probably amongst the most intensively researched people on the planet: The study website states that roughly 1100 publications have been based on the sample so far, which is more than one publication by participant on average ;)
 
Despite this impressive data, there were some things I found wanting in the analysis. My own experience with difference in differences methods comes from empirical labor economics, and this experience had led me to expect a number of robustness checks and supporting analyses that this article lacked. This is not surprising: Different disciplines can face similar methodological issues, yet still develop more or less independently of each other. In such situations, however, there will often be good reasons for “cross-pollination” of practices and methods. For instance, experimental economics owes a large debt to psychology, and the use of randomized field trials in development and labor economics owes a large debt to the use of randomized clinical trials in medicine.
 
The cannabis-and-IQ analysis basically compares average changes in IQ across groups with different cannabis use patterns. Since we haven’t randomized “cannabis use patterns” over the participants, we have an obvious and important selection issue: The traits or circumstances that caused some people to begin smoking pot early, and that caused some of these to become heavily dependent for a long time, can themselves be associated with (or be) variables that also affect the outcome we are interested in. The central assumption, in other words, is that the groups would have had the same IQ-development if their cannabis use had been similar. Since this is the central assumption required for this method to validly identify an effect of cannabis, it is crucial that the researchers provide evidence sufficient to evaluate the appropriateness of this assumption. To be specific, and to show what kind of things I wanted the researchers to provide, you would want to:
  • Establish that the units compared were similar prior to the treatment being studied - e.g., provide a table showing how the different cannabis-exposure groups differed prior to treatment on a number of variables.
  • Establish a common trend - Since the identifying assumption is that the groups would have had the same development if they had had the same “treatment”, then clearly the development prior to the treatments should be similar. In the Dunedin study, they measured IQ at a number of ages, and average IQ changes in various periods could be shown for each group of cannabis users.
  • Control for different sets of possible confounders. To show that the estimates that are of interest are robust, you would want to show estimates for a number of multivariate regressions that control for increasing numbers (and types) of potential confounders. The stability of the estimated effect and their magnitude can then be assessed, and the danger of confounding better evaluated: What happens if you add risk factors that are associated with poor life outcomes (childhood peer rejection, conduct disorders etc), or if you include measures of education, jailtime, unemployment, etc.? If the effect estimate of cannabis on IQ changes a lot, then this suggests that selection issues are important- and that confounders (both known and unknown) must be taken seriously. Adding important confounders will also help estimation of the effect we are interested in: Since they explain variance within each group (as well as some of the variance between the groups), they help reduce standard errors on the estimates of interest.
  • Establish sensitivity of results to methodological choices. Just as we want to know how sensitive our results are to the control variables we add, we also want to know how sensitive they are to the specific methodological choices we have made. In this instance, it would be interesting to allow for pre-existing individual level trends: Assume that people have different linear trends to begin with. To what extent are these differing pre-existing trends shifted in similar ways by later use patterns of cannabis? By adding in earlier IQ-measurements for each individual (which are available from the Dunedin study), such “random growth estimators” would be able to account for any (known or unknown) cause that systematically affected individual trajectories in both pre- and post-treatment periods. Another example is the linear trend variable they use for cannabis exposure, which presumably gives a score of 1 to never users, 2 to users who were never dependent, 3 to those scored as dependent once and so on. This is the variable that they check for significance - and it would be
  • Provide other diagnostic analyses, for instance by considering the variance of the outcome variable within each treatment group (how much did IQ change differ within each treatment group?). In this way, we could tell whether we seemed to be dealing with a very clear, uniform effect that affects most individuals equally, or whether it was a very heterogeneous effect whose average value was largely driven by high-impact subgroups.
  • Discuss alternative mechanisms. What potential mechanisms can be behind this, and what alternative tests can we develop to distinguish between these? For instance, let us say you identify what seems to be a causal effect of cannabis use and dependency, but its magnitude is strongly reduced (but not eliminated) when you add in various potential confounders. For instance, educational level. As the authors of the original paper note (when education turns out to affect the effect size), education could be a mediating factor in the causal process whereby cannabis affects IQ. However, this would mean that the permanent, neurotoxic effect they are most concerned with would be smaller, because part of the measured effect would be due to the effect of cannabis on education multiplied by the effect of this education on IQ. The evidence thus suggests that the direct “neurotoxic” effect is only part of what is going on. It also suggests that we might want to look for evidence to assess how strongly cannabis use causally affects education, to better understand the determinants of this process. For instance, even if there was only a temporary effect of cannabis on cognition, ongoing smoking would do more poorly in school or college, which might then influence later job prospects and long term IQ. The effect doesn’t even have to be through IQ: If pot smoking makes you less ambitious (either because of stoner subculture or psychological effects), the effect may still have long term consequences by altering educational choices and performance. Put differently: If the mechanism is via school, then even transitory effects of cannabis becomes important when they coincide with the period of education.
When I originally started looking into this last August, I sent an e-mail to the corresponding author asking for a couple of tables with information on “pre-treatment” differences between the exposure groups. I did not receive this. This is quite understandable, given that they were experiencing a media-blitz and most likely had their hands full. I therefore turned to past publications on the Dunedin cohort to see if I could find the relevant information there.

It turned out that I could - to some extent. Early onset cannabis use appeared to be correlated with a number of risk factors, and these risk factors were also correlated with poor life outcomes (low and poor education, crime, income etc.). The risk factors were also correlated with socioeconomic status.

The next question was whether these factors could affect IQ. One recent model of IQ (the Flynn-Dickens model) strongly suggested they would. The model sees IQ as a style or habit of thinking - a mental muscle, if you like - which is influenced by the cognitive demands of your recent environment. School, home environment, jobs and even the smartness of your friends are seen as in a feedback loop with IQ: High initial IQ gives you an interest in (and access) to the environments that in turn support and strengthen IQ. Since the risk factors mentioned above would serve to push you away from such cognitively demanding environments, it seemed plausible that they would affect long term IQ negatively by pushing you into poorer environments than your initial IQ would have suggested.

A couple of further parts to this potential mechanism can be noted (both discussed here): It seems that high-SES kids have a higher heritability of IQ than low-SES kids, which researchers often interpret as due to environmental thresholds: If your environment is sufficiently good, variation in your environment will have small effects on your IQ. If, however, your environment is poorer, similar variation will have larger effects. Put differently: The IQ of low-SES kids is more affected by changes to their environment than that of high-SES kids.

Also, there is a (somewhat counterintuitive, at first glance) result which shows that average IQ heritability increases with age. One interpretation of this is that our genetic disposition causes us to self-select or be sorted into specific environments as we age. The environment we end up with is therefore more determined by our genetic heritage than our childhood environment, where our family and school were, in some sense, “forced environments.”

In my research article, I refer to various empirical studies supporting these mechanisms and their effects. For instance, past studies that find SES, jailtime, and education to be associated with the rate of change in cognitive abilities at different ages. Putting these pieces together, the risk factors that make you more likely to take up pot smoking in adolescence, and that raise your risk of becoming dependent, also shift you into poorer environments than your initial IQ would predict in isolation. Additionally, these shifts are more likely for kids in lower-SES groups (since the risk factors are correlated with SES), and these also have an IQ more sensitive to environmental changes. Finally, for the same reason, the forced environment of schooling is likely to raise childhood IQ more for the low SES kids (because it is a larger improvement on their prior environments, and because their IQs are more sensitive to environmental influences). SES, then, is in some sense a summary variable that is related to a number of the relevant factors, in that low SES


  1. correlates with risk factors that influence, on the one hand, adolescent cannabis use and dependency and, on the other hand, poorer life outcomes, and
  2. signals a heightened sensitivity to environmental factors (the SES-heritability difference in childhood)
  3. probably reflects the magnitude of the extra cognitive demands imposed by school relative to home environment

For these reasons, SES seemed like a good variable to use in a mathematical model to capture these relationships. However, it should be obvious from my description of this mechanism that we should expect the mechanism to work even within a socioeconomic group: Even within this group, those with high levels of risk factors will experience poorer life outcomes, which may reduce their IQs. They will also most likely have higher probabilities of beginning cannabis smoking. At the same time, we would expect a smaller effect within a specific socioeconomic group than we would across the whole population.

However, I simplified this by using SES in three levels and created a mathematical model with these effects, using effect sizes drawn from past research literature where I could find it. Using the methods used in the original study, I tested my simulated data and found the statistical methods identified the same type and magnitude of effects here as they had in the actual study data. This, of course, does not prove or establish that there is no effect of cannabis on IQ. What it does is to show that the methods they used were insufficient to rule out other hypotheses, that the original effect estimates may be overestimated, and that we need to look more deeply into the matter, using the kind of robustness checks and specification tests I discussed above.

In my mind, this should be just the normal process of science - an ongoing dialogue between different researchers. We know that replication of results often fail, and that acting on flawed results can have negative consequences (see here for an an interesting popular science account of one such case). A statistical model by medical researcher Ioannides (at the centre of this entertaining profile) suggests that new results based on exploratory epidemiological studies of observational data will be wrong 80% of the time. The Dunedin study on cannabis and IQ would, it seems, fit into this category. After all, by the time you’ve published more than 1100 papers on a group of individuals, it seems relatively safe to say that you have moved into “exploratory” mode.

In light of this, critically assessing results and methods and proposing alternative explanations and further tests should be an everyday and expected part of research work. Such work is particularly important in cases like the Dunedin study, where the data involved is both costly and time consuming to construct, and thus very rare. As noted recently by Gary Marcus in a G+ comment (second comment here), flawed results based on such data is likely to persist for a “really long time” if we are to wait for other researchers to replicate the analyses on other data.

And that, finally, brings us back to the end. I remain hopeful that the original researchers will return to their data and address my methodological points properly: How robust and credible is the effect, and how sensitive is the effect magnitude to different sets of controls and methodological choices. However, I am wary that the pre-release to the press and the quick back-and-forth exchanges and position-taking this seems to have caused have reduced the likelihood of this taking place.