## Monday, May 9, 2011

### How convinced should we be of an economic theory that is “consistent with empirical data”?

What follows is not rocket science, and probably not 100% correct, but: When we say that “empirical tests” support an economic theory, does this mean the theory is probably right? More specifically, what I want to explore is whether there is a simple way of stating the issue so that we don’t ignore the base-rate .

An example of how important the way we state this issue is, comes from medical decision making: There’s a number of screening programs in place to identify people with medical conditions that can be harmful, and research on medical decision-making shows that doctors seriously misinterpret positive results from such tests. Simply put, test results are “misleading” when a test with even a low error rate is used to search for a rare condition in the general population: The small error rate multiplied by the huge number of healthy people gives you the bear share of those flagged as “positive” by the test.

An example from a nice write-up of this issue shows how difficult the issue is to understand when stated in probabilities:

In one study, Gigerenzer and his colleagues asked doctors in Germany and the United States to estimate the probability that a woman with a positive mammogram actually has breast cancer, even though she’s in a low-risk group […]:

The probability that one of these women has breast cancer is 0.8 percent.  If a woman has breast cancer, the probability is 90 percent that she will have a positive mammogram.  If a woman does not have breast cancer, the probability is 7 percent that she will still have a positive mammogram.  Imagine a woman who has a positive mammogram.  What is the probability that she actually has breast cancer?

Gigerenzer describes the reaction of the first doctor he tested, a department chief at a university teaching hospital with more than 30 years of professional experience:

“[He] was visibly nervous while trying to figure out what he would tell the woman.  After mulling the numbers over, he finally estimated the woman’s probability of having breast cancer, given that she has a positive mammogram, to be 90 percent.  Nervously, he added, ‘Oh, what nonsense.  I can’t do this.  You should test my daughter; she is studying medicine.’  He knew that his estimate was wrong, but he did not know how to reason better.  Despite the fact that he had spent 10 minutes wringing his mind for an answer, he could not figure out how to draw a sound inference from the probabilities.”

When Gigerenzer asked 24 other German doctors the same question, their estimates whipsawed from 1 percent to 90 percent.   Eight of them thought the chances were 10 percent or less, 8 more said 90 percent, and the remaining 8 guessed somewhere between 50 and 80 percent.  Imagine how upsetting it would be as a patient to hear such divergent opinions.

As for the American doctors, 95 out of 100 estimated the woman’s probability of having breast cancer to be somewhere around 75 percent.

The right answer is 9 percent.

The twist in the story comes from how easy this is to get right if you phrase the exact same question in a “natural frequencies” format:

Eight out of every 1,000 women have breast cancer.  Of these 8 women with breast cancer, 7 will have a positive mammogram.  Of the remaining 992 women who don’t have breast cancer, some 70 will still have a positive mammogram.  Imagine a sample of women who have positive mammograms in screening.  How many of these women actually have breast cancer?

My question is whether this format can be adapted to the case of empirical testing of a theory. We have three main terms that need to be “adapted”:

• Risk of false negatives – How likely is it that the theory will be rejected if it is actually true? Let us say this is quite unlikely (2%)
• Risk of false positives – How likely is it that the theory will be supported if it is actually false? This depends on how “observationally equivalent” it is to the true theory. Take rational addiction theory as an example: One article argues that consumption with a trend often will test positive for rational addiction even though there is no rational, forward-looking planned change in tastes going on. I find trended consumption far more plausible, so let us put the likelihood of “trended consumpti0n or some other non-rational addiction mechanism is actually present and testing positive by mistake” at 40%
• “Base-rate” – In medicine, this is the known prevalence of the disease in the population being tested. In our case it is not easily interpretable – but ask yourself , for instance, “how likely do I think it is that real junkies and cigarette smokers are gradually implementing a forward looking plan for changing their own tastes, and that this is the reason their use of cigarettes, heroin or whatever is gradually increasing?” Let us say we put this at 5%. This does sound both speculative and “science-fiction”ish, but could we interpret this as saying “of all the possible universes that would have unfolded consistently with our current history and experiences – in how many of these do we think real junkies and cigarette smokers [….]”?

If we think this sounds OK, we could try something along the lines of:

My feeling/guess is that only 20 out of 1000 universes we might be living in would have rational addicts. In all 20 of these universes rational addiction theory would do well in testing. Of the remaining 980 universes that do not contain rational addicts, some 392 will test positive. Imagine that our current test-results indicate that we live in one of the 412 universes that test positive for rational addiction. How likely is it that there really are rational addicts?

This is (I think) quite basic Bayesian updating, so the whole “new” thing here is the attempt to rephrase it in a way that makes the base-rate point obvious: After positive test-results, the likelihood that we are living in the rational addiction world would be 4,8% – higher than 2% (our starting estimate) – but still very low.

(Of course – you may quibble with the numbers I put on it – in fact, so would I – but they’re just there to have something to put into the format I was testing)