Thursday, January 16, 2014

Is there no racial bias precisely because it seems like there is?

Consider a criminal activity that is equally prevalent in two groups, but police arrest a larger share of group A than group B. Is this evidence for or against discrimination?

In the US debate on drug policy, this is seen as evidence of racial bias. Ezra Klein pointed out recently that similar shares of african-americans and whites use cannabis, but that african-americans are arrested far more often for marijuana possession. He sees this as clear evidence of racial bias: More arrests despite equal crime rates shows this clearly.

In the academic economics literature, a 2001 paper by John Knowles and Nicola Persico in a leading economics journal (ungated here) presents an “empirical test” of “racial bias in motor vehicle searches” that flips this interpretation 100% upside down. I’ll explain the reasoning in detail below, but the model basically presents the same kind of statistical fact as evidence of a lack of discrimination: If the groups are equally law abiding despite one being searched more often, then this means that the police have targeted the “crime prone” group only up to the point where this targeting made them break the law at the same rate as others. The police do not care about color, only arrests, and they have only used color to the extent that it helped them predict probability of criminal activity (statistical discrimination). Equal underlying crime rates despite more arrests from one group shows this clearly.

Why the difference in interpretation? Well – the crucial underlying assumption in the economics paper is that groups perceive their risk of being stopped and searched, and that they respond rationally to the risk that their law breaking will be observed and punished. Given this, the argument goes through – without it, the whole thing breaks down. The paper is aware of this, writing in a “discussion” section that

Our model assumes that motorists respond to the probability of being searched. This assumption is key to obtaining a test for prejudice that can be applied without data on all the characteristics police use in the search decision. If motorists did not react to the probability of being searched, testing for prejudice would require data on c [defined as  “all characteristics other than race that are potentially used by the officer in the decision to search cars”].

Interestingly, though, while the paper does note the central importance of this assumption, it does not find it necessary to present any empirical evidence for it. This seems odd to me: It may well be a standard theoretical assumption, but this paper presents an empirical test that relies on this assumption being true. Empirical claims consistent with abstract theory, in other words, seems to have gotten a free pass from the referees in a top-5 journals in economics (Journal of Political Economy). The researchers even turn the tables and put the kind of argument that Ezra Klein represents on trial:

The argument that infers racism from this evidence relies on two very strong assumptions: (1) that motorists of all races are equally likely to carry drugs and (2) that motorists do not react to the probability of being searched. Relaxing these assumptions, as we do in this paper, leads to a very different kind of test.

The free pass that some economists give empirical claims provided they derive from “standard theory” is, admittedly, one of my pet peeves. Still – it seems kind of ballsy to say that they’re merely “relaxing” this assumption when they actually seem to be making two other claims that are equally strong:

  1. Assumption: Motorists of all groups perceive levels and changes in the objective probability that they will be stopped, and respond rationally to these so that, as a result
  2. Motorists of all races, ages, looks etc. are equally likely to carry drugs in equilibirum, which is where we as economists should think they are.

Now – a major caveat: the paper has already been cited more than 300 times, and for all I know the large literature may have kicked the tires and empirically tested all kinds of assumptions and implications from this paper. My own prior, however, would be that assumption 1 may be too strong – and I would like to see how robust the argument is to changes in this assumption. I’ll gladly admit to not having deep familiarity with this literature, but if I recall correctly from Reuter and MacCoun’s excellent book “Drug war heresies” that I read some ten years ago, people grossly exaggerate their risk of being detected for lawbreaking (e.g., for speeding violations and so on). It also seems to be difficult to find evidence that the intensity of drug law enforcement efforts has any strong effect on the prevalence and intensity of use – which has become a common argument against strict drug law enforcement and in favor of decriminalization. The “behavioral” literature and work by economists such as Kip Viscusi likewise suggests that people are poor at perceiving small risks accurately.

Even on an everyday level, it is unclear to me how individuals would get information on the probability that they will be stopped – most of us will be stopped so rarely that it is hard to estimate the risk based on our own experience, and we rarely pool the relevant quantitative information with others (“I drove a total of 70 hours last year, and was stopped by the police zero times. How about you? I’m trying to get enough observations to identify my risk of being stopped – and we seem similar enough that our data can be pooled.”)

As regards the second claim, that all subgroups in the population will break the law at the same rate, this seems too strong – but is what makes the paper’s “empirical test” for discrimination so simple: When the police have the same cost of stopping and searching single cars from different groups, then:

If the returns to searching [roughly: probability of a motorist being a lawbreaker] are equal across all subgroups distinugishable by police, they must also be equal across aggregations of these subgroups, which is what we can distinguish in the data. [emphasis in original]

The prediction, in other words, is that any group defined in terms of characteristics observable by the police has the same probability of breaking the law in equilibrium. Middle aged white dads with a station wagon full of kids, elderly ladies on their way to Bingo, etc., would all carry drugs in their car with the same probability as anyone else. This seems implausible and highly unlikely to be true. (Even less plausible, the model has every individual carrying drugs in their car some of the time, though this implication is “fixed” in the discussion section of the paper by introducing unobservable factors within each group.)

I’m (hopefully obviously) not saying that I’ve in any way disproved this theory based on these remarks, but chalk me up as unconvinced of the validity of the empirical test suggested in this particular instance.

This basically concludes my comments on the paper itself, which I present mostly as an interesting example of how economists’ belief in their theory can make them interpret things completely different from other people. Most readers may want to quit here (or wish they’d never begun reading, for all I know), but for those who want to understand the theory used by the economists I’ll try to get the main ideas across in a non-technical way:

The article argues using a model built on a simple basic idea: If you were 100% certain to be stopped and have your vehicle searched, you wouldn’t carry illegal contraband. If you were 0% certain that you will not be stopped, you would be certain to carry illegal contraband (because Homo Oeconomicus: “Hey! Profit opportunity with no risk! Why not?”).  Consequently, there is a search probability between 0 and 1 where you are indifferent between carrying and not carrying contraband, and this can be thought of as a “flip point”  where you switch from one choice to the other.

Different people will belong to different groups, which can obviously be in different situations: The payoff from carrying contraband and the cost of being punished may differ, and as a result the flipping points of different groups will differ as well. To keep the language neutral, we’ll call the groups high risk and low risk, where the high risk group needs to face a high search rate before they are discouraged from carrying contraband, whereas the low risk individuals require only a small search risk before they bow out of the illegal activity.

To see how this model plays out, imagine that you are a police officer whose only goal is to maximize the number of arrests you make. Assume also that the cost and time spent on a search is independent of who you stop and search. Somewhat unrealistically, we can imagine that we start from a situation with no search risk for any of the two groups. 

You begin by searching a little in both groups and find high rates of lawbreaking in both groups. In fact, in the model you find that everyone you stop is a lawbreaker and can be arrested. Consequently, you spend ever more of your policing time on stopping drivers. However – this alters the incentives of the drivers. As you stop ever larger shares of motorists, the low-risk motorists stop carrying contraband. The time you spend stopping them becomes worthless as you find nothing that justifies arrest. The high risk group still breaks the law, as your search rate is still below their flip point. Consequently, you don’t increase your search intensity towards low-risk motorists, but keep stopping more and more high risk motorists instead. This keeps on until you reach their flip point – at which point you stop searching higher numbers (if you did – they would all stop and you’d have no arrests).

The outcome is that the only stable equilibrium is one where each group is searched at a rate equal to their flip point. Stop them less than this, and they become more criminal and you would get a high probability of arrests per individual stopped. This would cause you to stop them more often again. Stop them more often than their flip point, and they would become less criminal and your efforts would yield a low hit-rate. The police thus act in a way that causes groups with different “criminal inclinations” to act identically in equilibrium. (This conclusion requires that the costs to the police of searching a car is the same across groups, but this is assumed throughout most of the paper and is needed for their test to work on the data they use)

A reasonable question to this equilibrium is how the contraband-carrying rate within a group is determined. In the baseline model each individual will now be indifferent between carrying and not carrying contraband in their car. How do they decide how often to do it?

If the motorists carry drugs too often, the police will find that the value of stopping more of them is positive (value of an arrest times probability of being guilty would be higher than the stop-and-search cost). If they carry drugs too rarely, the police will want to stop them less often and the motorists would basically be leaving money on the table (because Homo Oeconomicus – positive expected return from crime). Consequently, in equilibrium, the motorists in each group carry drugs with a probability that makes the police indifferent between stopping and not stopping them: The expected value of stopping them (i.e., contraband probability times value of arrest) is then equal to the expected cost (marginal stopping-and-searching cost for the relevant group).

As noted in the beginning of the post – the baseline model assumes that all individuals “flip a coin” or “toss a dice” every morning to decide whether or not to traffic drugs that day, but this can be altered by assuming unobservable characteristics that differ within each group making some of them more and some of them less crime prone.