Freakynomics: May 2011

Wednesday, May 25, 2011

What graduate school economics did and did not teach some random dude

I`ve got no idea who this guy is – found links to these posts from Tyler Cowen’s blog – but I found his reflection on his graduate economics education (see also part 2) insightful and interesting.

Some highlights (that is to say – things that remind me of my own opinions ;-)

coming as I did from a physics background, I found several things that annoyed me about the course (besides the fact that I got a B). One was that, in spite of all the mathematical precision of these theories, very few of them offered any way to calculateany economic quantity. In physics, theories are tools for turning quantitative observations into quantitative predictions. In macroeconomics, there was plenty of math, but it seemed to be used primarily as a descriptive tool for explicating ideas about how the world might work. At the end of the course, I realized that if someone asked me to tell them what unemployment would be next month, I would have no idea how to answer them.

As Richard Feynman once said about a theory he didn't like: "I don’t like that they’re not calculating anything. I don’t like that they don’t check their ideas. I don’t like that for anything that disagrees with an experiment, they cook up an explanation - a fix-up to say, 'Well, it might be true.'"

That was the second problem I had with the course: it didn't discuss how we knew if these theories were right or wrong. We did learn Bob Hall's test of the PIH. That was good. But when it came to all the other theories, empirics were only briefly mentioned, if at all, and never explained in detail. When we learned RBC, we were told that the measure of its success in explaining the data was - get this - that if you tweaked the parameters just right, you could get the theory to produce economic fluctuations of about the same size as the ones we see in real life. When I heard this, I thought "You have got to be kidding me!" Actually, what I thought was a bit more...um...colorful.

(This absurdly un-scientific approach, which goes by the euphemistic name of "moment matching," gave me my bitter and enduring hatred of Real Business Cycle theory, about which Niklas Blanchard and others have teased me. I keep waiting for the ghost ofFrancis Bacon or Isaac Newton to appear and smite Ed Prescott for putting theory ahead of measurement. It hasn't happened.)

[…]

DeLong and Summers are right to point the finger at the economics field itself. Senior professors at economics departments around the country are the ones who give the nod to job candidates steeped in neoclassical models and DSGE math. The editors of Econometrica, the American Economic Review, the Quarterly Journal of Economics, and the other top journals are the ones who publish paper after paper on these subjects, who accept "moment matching" as a standard of empirical verification, who approve of pages upon pages of math that tells "stories" instead of making quantitative predictions, etc. And the Nobel Prize committee is responsible for giving a (pseudo-)Nobel Prize to Ed Prescott for the RBC model, another to Robert Lucas for the Rational Expectations Hypothesis, and another to Friedrich Hayek for being a cranky econ blogger before it was popular.

And from the follow-up blog-post which discusses the field-courses he chose (which, AFAIK are the courses he voluntarily chose):

The field course addressed some, but not all, of the complaints I had had about my first-year course. There was more focus on calculating observable quantities, and on making predictions about phenomena other than the ones that inspired a model's creation. That was very good.

But it was telling that even when the models made wrong predictions, this was not presented as a reason to reject the models (as it would be in, say, biology). This was how I realized that macroeconomics is a science in its extreme infancy. Basically, we don't have any macro models that really work, in the sense that models "work" in biology or meteorology. Often, therefore the measure of a good theory is whether itseems to point us in the direction of models that might work someday.

[…]

all of the mathematical formalism and kludgy numerical solutions of DSGE give you basically zero forecasting ability (and, in almost all cases, no better than an SVAR). All you get from using DSGE, it seems, is the opportunity to puff up your chest and say "Well, MY model is fully microfounded, and contains only 'deep structural' parameters like tastes and technology!"...Well, that, and a shot at publication in a top journal.

Finally, my field course taught me what a bad deal the whole neoclassical paradigm was. When people like Jordi Gali found that RBC models didn't square with the evidence, it did not give any discernible pause to the multitudes of researchers who assume that technology shocks cause recessions. The aforementioned paper by Basu, Fernald and Kimball uses RBC's own framework to show its internal contradictions - it jumps through all the hoops set up by Lucas and Prescott - but I don't exactly expect it to derail the neoclassical program any more than did Gali.

Tuesday, May 24, 2011

The source of our policy views – an honest opinion from Steven Levitt

Freakonomics-author Levitt recently posted on why he strongly opposed the US ban on internet poker, while weakly preferring drug prohibition (despite the good arguments against it) and legalized abortion.

I’ve never really understood why I personally come down on one side or the other with respect to a particular gray-area activity. […]

It wasn’t until the U.S. government’s crackdown on internet poker last week that I came to realize that the primary determinant of where I stand with respect to government interference in activities comes down to the answer to a simple question: How would I feel if my daughter were engaged in that activity?

If the answer is that I wouldn’t want my daughter to do it, then I don’t mind the government passing a law against it. I wouldn’t want my daughter to be a cocaine addict or a prostitute, so in spite of the fact that it would probably be more economically efficient to legalize drugs and prostitution subject to heavy regulation/taxation, I don’t mind those activities being illegal.

Some express disappointment in Levitt for this comment:

What's missing in Levitt? The whole idea of tolerance. It's easy to tolerate people doing what you would do and approve of. It's harder to tolerate what you don't approve of. It's even harder to tolerate activities and behaviors that you find disgusting. Levitt has just confessed that he's intolerant or, at least, that he won't object to a government that's intolerant. That's disappointing. I had expected better of him.

Personally, I find this a misreading of his point. I don`t think he`s saying that he believes this is how it should be – just that this seems to be the way it is. If anything, the fact that he has tried to reflect on the source of his opinions and their possible basis in emotions makes me trust the guy more.

Seems to me that we often have a strong feeling or “intuition” that something is good or bad, and that the smarter we are the better we`re able to convince ourselves that this is due to logical arguments. There`s a host of good stuff on the psychological mechanisms driving our attitudes towards sources of risk in Dan Gardner`s book “The science of fear.” There`s a host of good stuff on how easily we trick ourselves in Kurzban`s “Why everyone (else) is a hypocrite”. Who hasn`t been in a discussion with intelligent, informed people who dig themselves deeper and deeper into a hole while trying to defend some ridiculous opinion. (And who hasn`t at times been that very same person themselves?)

Note: I`m not making the argument that we can`t learn and modify our views when confronted by evidence. But I am making the claim that this is frequently difficult to do, and that someone able to reflect on their feelings and biases (as Levitt does here) seems more open to changing his views than somebody who ignorantly imagines him- or herself to be a rational, evidence-based and principled logic machine.

Monday, May 23, 2011

“As-if behavioral economics” – puzzle: How can an as-if theory be normative?

Although I enjoyed it, I’ve spent the last few days on this blog noting some issues where I disagree with the paper ”As-if behavioral economics”. Today I want to reflect on something they touch upon without fully resolving.

Some economists argue that their assumptions can`t be questioned because their models are “as-if” - they are merely tools that allow you to successfully predict market data, and the realism of the assumptions is irrelevant. If that is so - why are there so many norms and criteria apart from prediction that a “good” model should fulfill? And why - if they are mere “as-if” prediction-generating machines - are the neoclassical models held up as a normative ideal we should strive to aim for in our own decision making?

Berg and Gigerenzer touch on this puzzle in a couple of places. For one thing, two of the points they emphasize is that

behavioral economics suffers from subscribing to the as-if method, which ignores the realism of the assumptions (similarity of model to the real-world mechanism/process), and that
behavioral economics has grown to see behavioral “heuristics” as “biases” that violate the normatively correct neoclassical rules

Later, they also note that the

the normative interpretation of deviations as mistakes does not follow from an empirical investigation linking deviations to negative outcomes. The empirical investigation is limited to testing whether behavior conforms to a neoclassical normative ideal.

Consider - if the model is nothing but a black box that spits out impressive predictions:

Why is it important that agents inside the model are optimizing and rational?
Why is it important that the agents are well informed?
Why is it important that preferences are “standard” (thus generating well behaved utility functions and nice indifference curves)?
Why does it matter whether or not your prediction is based on an “equilibrium” inside the model?
How can the utility and welfare effects of a model imply anything about real people`s welfare?

This is particularly odd since, as far as I can tell, rational optimizers can behave in all sorts of ways depending on their preferences and the choice problem they face. When assumptionsdon’t need to be supported by empirical evidence, this means that any observable behavior pattern can be modelled as rational behavior given some hypothetical choice problem. If you don't believe me, ask yourself whether you can describe any specific behavior pattern that could not be the result of rational choice. Note that this has to be a pattern, that is to say that it has to be stated in terms of observables without reference to “underlying” but non-observable preferences. You can refer to prices, consumption goods, patterns across time and between goods, etc., and using such categories I don`t think it is possible to find any “non-rationalizable consumption pattern” that would be accepted as that by most economists.

So what?

Well - if anything can be rationalized by such a theory, and assumptions can be as unrealistic as you want - then any stable pattern can be “explained” by such a “theory.” In actuality, though, you would just be describing the pattern using a different format (the “rational choice model” format). Which raises the question of why it is so important to use that format.

After all - if all you want to do is to predict, then it shouldn`t matter whether you assumed people to behave “as if” they were maximizers or not. Any model would be just as good if it predicted equally well.

Also - if the rational choice model is just a format - a way of describing behavior by identifying some “story” that would generate it - then why should it have normative power?

This is extra puzzling if you consider the old-school style Chicago-economics that sees all behavior as rational. If this is so, then there is no normative power beyond “do whatever you do cause that’s what’s optimal.” Taken at face value, this view of the world would also lead to apathy: There’s

no point in criticizing politicians or engaging with the world, because everyone knows what they’re doing and are doing what’s best for themselves. Politicians – that’s public choice. Regulators - they`ve been captured by special interests. Economists? Well - I guess their doings could be made endogenous as well.

I don’t have an answer to this puzzle - but I wonder if it may have something to do with politics. By both claiming that everyone is rational and that this rationality represents the normative ideal for action, then a world of unfettered markets seems like a good idea: It would be a world of informed, self-interested people generating huge benefits to each other through their selfish doings. If so - then behavioral economics becomes the “interventionist” response: Yes - a neoclassical paradise would be great – however, unfortunately, we’re just evolved apes with lots of biases and flaws. With a little carefully designed policy, though, we can regulate and nudge people in the direction of the truly rational agent.

Does anyone know of a survey that would make it possible to correlate policy views and politics with economists`attitudes towards behavioral and old-school rational choice theory?

Friday, May 20, 2011

Strauss-Kahn and rational assault

Tyler Cowen generated a bit of discussion recently with his blog-comment on Dominique Strauss-Kahn

Dominique Strauss-Kahn has been arrested, taken off a plane to Paris, and accused of a shocking crime. When I hear of this kind of story, I always wonder how the “true economist” should react. After all, DSK had a very strong incentive not to commit the crime, including his desire to run for further office in France, not to mention his high IMF salary and strong network of international connections. So much to lose.
Should the “real economist” conclude that DSK is less likely to be guilty than others will think?

Let`s try to answer the question:
A bad economist would think : Strauss-Kahn clearly has more to lose and thus less of an incentive to sexually assault – which makes it unlikely that he did. So he is probably innocent.
A better economist would go one step further: Strauss-Kahn realizes that we would think this way, which makes crime relatively risk-free for him. This makes it likely that he did perform the crime. So he is probably guilty.
The even better economist would go even further: Since we realize that Strauss-Kahn would realize this, and that he would want to exploit this mechanism, we can conclude that he is probably guilty.
The “real economist,” finally, would realize that this infinite loop would lead Strauss-Kahn to play his part in implementing a randomized, mixed-strategy equilibrium by throwing a dice to decide whether or not to run naked down hallways assaulting hotel staff. The economist would then write up the model, derive suitably generalized solutions for various assumptions of payoffs and attitudes towards risk, and publish it in a high ranking journal, using the Kahn-Strauss story as a motivating example in the introduction.

Wednesday, May 18, 2011

“As if behavioral economics” - flaw 3: Adding a parameter is not all behavioral economists have done

I´m writing through some issues raised by the paper As-if behavioral economics. I have one more annoyance I want to raise with the paper before I move on to some of its strong points.

The annoyance I want to note today is one that disappoints me. Berg and Gigerenzer write:

Behavioral models frequently add new parameters to a neoclassical model, which necessarily increases R-squared. Then this increased R-squared is used as empirical support for the behavioral models without subjecting them to out-of-sample prediction tests.

This is silly. Yes, adding a parameter does increase R-squared (the share of the variation in the data that your statistical model captures), but this way of phrasing it makes it sound as though any variable added to a statistical model would increase R-squared by the same amount. That´s not the case: A randomly picked variable that is irrelevant would (if we ignore time trends and that sort of data) on average have zero explanatory power. The standard test is to check the significance level of the variable. This answers the following question: If the variable actually has no explanatory power for the data - how likely is it that it would “by chance” seem to explain whatever it seems to explain in the current dataset? The normal significance level to test at is 5%, and if you use that significance level the “irrelevant” variable will seem relevant in your data only 5% of the time. I´m pretty sure Berg and Gigerenzer know this.

A related flaw shows up in their discussion of Fehr and Schmidt´s model of inequality aversion (which assumes that some people dislike inequality, especially inequality in their own disfavor). Berg and Gigerenzer write:

In addition, the content of the mathematical model is barely more than a circular explanation: When participants in the ultimatum game share equally or reject positive offers, this implies non-zero weights on the “social preferences” terms in the utility function, and the behavior is then attributed to “social preferences.”

This, too, is weak. What Fehr and Schmidt´s model assumes is that there is a specific structure to the inequity aversion: That your dislike of how much better (or worse) off someone else is than you is a linear function of how much better off than you they are. And, second, that it´s worse being behind someone than in front of someone, even if you´d prefer most of all that you were equal. It may be this model is "wrong," but it is more than circular and there is a variety of competing models that others have promoted as better ways of capturing typical patterns in experimental data on various economic games (off the top of my head, Charness and Rabin (2002), Bolton and Ockenfels (2000) and Engelmann and Strobel (2004)).

Having said that, it might well be that Fehr and Schmidt is a crude model that fails to capture and process the relevant data in the best way. However, it does so well enough to be useful and interesting. If you found a model that did better and that could also predict well for new experiments, as well as in different settings - using less information that could more credibly be related to actual pscyhological processes - then I´m pretty sure you would be published quickly in a good journal. That´s not to say that “you shouldn´t criticize unless you can do better,” but it is to say that the current model captures something interesting in a simple way - even if it is clearly imperfect. Clarifying its weaknesses is fair game - but Berg and Gigerenzer should do better than brushing it off as though its fit with data was no better than any random model thrown up.

Tuesday, May 17, 2011

"As if behavioral economics" - flaw 2: "Neglecting the process is always wrong"

I´m writing through some issues raised by the paper As-if behavioral economics. It´s a sprawling paper with some very good arguments and some… not so good ones. I´m hoping to get through both good and bad this week.

The paper opens with a reasonable goal - evaluating whether behavioral economics has achieved its (sometimes) stated goal of improved empirical realism:

Insofar as the goal of replacing these idealized assumptions with more realistic ones accurately summarizes the behavioral economics program, we can attempt to evaluate its success by assessing the extent to which empirical realism has been achieved.

This is an OK idea for a paper: Some tradition has aimed to achieve X, and we want to see how successful they´ve been in this. However, Berg and Gigerenzer also imply in much of the paper that this aim (empirical realism in the assumed decision making process) is always an important aim, and that any economic theory that fails in this regard is wrong. They call behavioral economics a “repair program” for the flaws of neoclassical “rational choice” economics, and have a long section on how “empirical realism” was sold, bought and re-sold (i.e. they had it in mainstream economics, lost it due to Pareto and his friends, started getting it back with behavioral economics, but then lost it as these strayed from the path):

perhaps after discovering that the easiest path toward broader acceptance into the mainstream was to put forward slightly modified neoclassical models based on constrained optimization, the behavioral economics program shed its ambition to empirically describe psychological process, adopting Friedman‘s as-if doctrine.

So why is this empirically accurate process description so important in Berg and Gigerenzer´s view? The reason seems to be that they give different implications for how we can aid and improve human choice. After an (interesting) explanation of how ball-players catch balls through a simple heuristic (“run so that the ball up in the air is at a constant angle to you”) rather than through “intuitive” application of Newtonian mechanics, they write:

Thus, process and as-if models make distinct predictions (e.g., running in a pattern that keeps the angle between the player and ball fixed versus running directly toward the ball and waiting for it under the spot where it will land; and being able to point to the landing spot) and lead to distinct policy implications about interventions, or designing new institutions, to aid and improve human performance.

This is a good and valid argument in its relevant context but it surely fails to apply to all types of economics. It seems particularly relevant as a criticism of welfare economics, which often involves nothing more substantial than the argument that “all choices are always welfare-maximizing, so any new choice option that is chosen improved welfare.” However, not all of economics is (or should be) dealing with this.

To my mind, at least part of what economics is about is the study of interactions: What happens when many people interact in a given institutional context (market, negotiation or whatever) and there are mechanisms (prices, norms, whatever) that introduce various positive and negative feedback effects? To study this you need a method, and one such method is to create a “toy world” where “toy people” act in a way that captures relevant behavioral regularities in real people. If people tend to buy less of a good when the prices rises, then you need a toy person who responds like this. If you think it may be important that people in some market want to buy the same thing as some other person or group (e.g. fashion), then you need a toy person who exhibits this response. However, you don´t need a psychologically realistic model of a person because all you want (in this context) is to see what the outcome of various interaction effects would be.

Sometimes (usually, I would guess), economists will do this in a closed, simple mathematical model with utility maximizing agents and profit maximizing firms. However, since utility maximizing agents can behave in almost any conceivable way (just change their preferences and introduce state variables as in Becker´s extended utility approach), this “rationality postulate” doesn´t really constrain the kinds of behavior you can study that much. You are likely more constrained by the expectations of other economists that the toy people and firms in the model should have “model consistent expectations” (i.e., they should expect the consequences of their actions that actually occur), and that it is important and interesting to study the subtle mechanisms that are created when these toy agents consequently marginally adjust their behavior for all sorts of reasons (Hotelling´s rule, the green paradox, smokers responding to expectations of future tax hikes by smoking less today, etc.).

Another way of doing this is agent based modelling, where you create small “ant people” in a computer program and let them interact based on simple rules. You do this again and again and see what “typically happens” and so on. This is related to evolutionary game theory where the shares of “agents” living by some simple strategy grows or shrinks depending on the average payoff it produces given the current mix of strategies in the population.

Anyway - though none of these ways of studying interaction are sufficient to credibly examine the social world around us, they don´t seem completely valueless. Granted - some (many?) economists do take the welfare of the toy people a bit too seriously as a proxy for real world consumer welfare, and some seem to think that tweaking a toy person to act like a real person means that the real person “is similar” pscyhologically to the toy person. But these are errors in interpretation and use, not in the method as such.

In short: If you want to show how simple behavioral patterns at the individual level could combine to create various higher-level patterns in groups and markets and other contexts, then what you want is the simplest, most tractable representation of those behavior patterns. Psychological realism is irrelevant - because your argument is “several people interacting in this specific way, each of whom exhibit this simple behavior patterns, would generate these and these aggregate patterns and would - in aggregate - respond in this and this way to various external shocks in the environment”.

Monday, May 16, 2011

"As if behavioral economics" - flaw 1: The “true tradition” argument

I recently read As-if behavioral economics, a paper critical of behavioral economics written by Nathan Berg and Gerd Gigerenzer. Last week I presented the underlying narrative that they seem to imply. This week I hope to have time to discuss some of the more substantial good and bad points of their paper.

However, before we move on to substance I have one annoyance that I want to get off my chest: What I call the “true tradition” argument. I´ve touched on this before - regarding the “Holy Scripture” view that some people seem to have of Smith´s Wealth of Nations, but this paper does it again and I find it silly and annoying.

The “structure” of the argument (if you can even call it an argument) is one of two:
* “Somebody I disagree with has fallen from the true and pure tradition”
* “I may seem to be an outsider, but I´m actually the true carrier of the true and pure tradition”

You see this in religion and alternative movements such as meditation or NLP- where people trace their guru or Kung-Fu teacher or whatever back to some original figure. “My teacher studied under X, who studied under Y, who studied under Z in a pure unbroken line back to (idolized figure or text)” or the long "X begat Y who begat Z who begat.." sections of the old Testament.

You also see this in quasi-scientific practices such as Freudian psychoanalysis. It´s probably even more pronounced in some parts of Austrian economics, where the discussion of what Hayek or Mises or Böhm-Bawerk or Menger “truly” meant seems to be a huge thing. Followers of Ayn Rand are the same or worse. You see it in people who make a big ado about how their claims are foreshadowed in Aristoteles or some ancient philosopher´s speculative musings as if that should somehow count as relevant evidence for an empirical claim.

Amongst people opposed to “standard economics” there seems to be a similar thing going on - to me, the family tree of the “other canon” project seems a clear example.

And in Berg and Gigerenzer´s paper, the “wrong turn” of economics is identified as the

fundamental shift in economics which took place from the beginning of the twentieth century: the ̳Paretian turn‘. This shift, initiated by Vilfredo Pareto and completed in the 1930s and 1940s by John Hicks, Roy Allen and Paul Samuelson, eliminated psychological concepts from economics by basing economic theory on principles of rational choice.

You could choose to ignore this kind of stuff - see it as narratives that help provide groups of people with a feeling of connection to a larger tradition and that places their work and struggles into a larger storyline of good and bad. But seriously… it´s just stupid.

More than stupid, I see this as a real problem in that it raises as a significant and important issue something which is irrelevant to the evaluation of scientific claims. Nobody has a hotline to truth! I don´t care how smart you are or how often you´ve been right before - even the smartest people in the world can be misguided and confused and incorrect. Their claims must be evaluated and confronted with evidence, and if they´re wrong they´re wrong and we move on.

Tuesday, May 10, 2011

Is behavioral economics a flawed band-aid on the neoclassical enterprise?

I finally got around to reading the paper “As-if behavioral economics: Neoclassical economics in disguise?” by Nathan Berg and Gerd Gigerenzer this past Easter holiday. I found it enjoyable, often insightful, and somewhat confused. It contained a lot of stuff, so I´ll split this into several parts.
Today I´ll merely go through the overall “story” they seem to be operating from. This isn´t the “storyline” of the paper, but more the story such as I can piece it back together from the pieces and clues they scatter throughout the paper.
Their story is that economics was a sensible science informed by psychological science until an italian economist called Pareto turned it into the current, neoclassical “monster” we have today.

a fundamental shift in economics which took place from the beginning of the twentieth century: the ̳Paretian turn‘. This shift, initiated by Vilfredo Pareto and completed in the 1930s and 1940s by John Hicks, Roy Allen and Paul Samuelson, eliminated psychological concepts from economics by basing economic theory on principles of rational choice.

This new framework assumed that people´s stable preferences can be described by a mathematical utility function such that any good (provided in sufficient quantities) can fully compensate for a reduction in any other good.

If, for example, x represents a positive quantity of ice cream and y represents time spent with one‘s grandmother, then as soon as we write down the utility function U(x, y) and endow it with the standard assumptions that imply commensurability, the unavoidable implication is that there exists a quantity of ice cream that can compensate for the loss of nearly all time with one‘s grandmother.

In addition, this framework built up an axiomatic, logical theory of normative rationality centered around internal consistency. That is to say, they argued that people should have transitive preferences, conform to expected utility axioms and have Bayesian beliefs.
This was actually just an unsupported (and in Berg and Gigerenzer´s view, false) assumption, in that they never even attempted to establish that such rules would lead to better outcomes in the real world.

Expected utility violators and time-inconsistent decision makers earn more money in experiments (Berg, Johnson, Eckel, 2009).

Because this theory completely misspecified how people make choices and process beliefs, it became necessary to ignore the realism of the assumptions. For this reason, they turned to the “as-if” methodology that they saw Friedman as having preached: All models are only to be evaluated in terms of how well they predict - and the realism of the assumptions is irrelevant. They describe this as

the Friedman as-if doctrine in neoclassical economics focusing solely on outcomes.

This did not fully solve the underlying problem: Since people do not choose in this way, predictive ability was poor. Behavioral economists initially wanted to tackle the root of the problem by reintroducing realism (psychology) into the description of consumer behavior. After a while, though, they were instead reduced to adding bells and whistles of various kinds to patch up the existing formal framework so that it would better predict in an as-if sense.

Instead of asking how real people—both successful and unsuccessful—choose among gambles, the repair program focused on transformations of payoffs (which produced expected utility theory) and, later, transformations of probabilities (which produced prospect theory) to fit, rather than predict, data. The goal of the repair program appeared, in some ways, to be more statistical than intellectual: adding parameters and transformations to ensure that a weighting- and-adding objective function, used incorrectly as a model of mind, could fit observed choice data.

Their work, by introducing further complications into the choice models, actually made things worse - in that they made the resulting “theory” of human choice even less plausible.

Leading models in the rise of behavioral economics rely on Friedman‘s as-if doctrine by putting forward more unrealistic processes—that is, describing behavior as the process of solving a constrained optimization problem that is more complex—than the simpler neoclassical model they were meant to improve upon.

On the normative side, most behavioral “epicycles” that were introduced came to be seen as biases and flaws that needed nudging and paternalistic regulation.

To these writers (and many if not most others in behavioral economics), the neoclassical normative model is unquestioned, and empirical investigation consists primarily of documenting deviations from that normative model, which are automatically interpreted as pathological. In other words, the normative interpretation of deviations as mistakes does not follow from an empirical investigation linking deviations to negative outcomes. The empirical investigation is limited to testing whether behavior conforms to a neoclassical normative ideal.

Finally, perhaps in an effort to avoid revealing how poor both the neoclassical and behavioral models actually are, the bar for predictive success was lowered even further by turning it into an exercise in fitting models to existing data rather than an exercise in making successful out-of-sample predictions.

Behavioral models frequently add new parameters to a neoclassical model, which necessarily increases R-squared. Then this increased R-squared is used as empirical support for the behavioral models without subjecting them to out-of-sample prediction tests.

That´s the story as I read it, and the authors continue to describe their view of what they think should be done. But that will have to wait for another time.

Monday, May 9, 2011

How convinced should we be of an economic theory that is “consistent with empirical data”?

What follows is not rocket science, and probably not 100% correct, but: When we say that “empirical tests” support an economic theory, does this mean the theory is probably right? More specifically, what I want to explore is whether there is a simple way of stating the issue so that we don’t ignore the base-rate .

An example of how important the way we state this issue is, comes from medical decision making: There’s a number of screening programs in place to identify people with medical conditions that can be harmful, and research on medical decision-making shows that doctors seriously misinterpret positive results from such tests. Simply put, test results are “misleading” when a test with even a low error rate is used to search for a rare condition in the general population: The small error rate multiplied by the huge number of healthy people gives you the bear share of those flagged as “positive” by the test.

An example from a nice write-up of this issue shows how difficult the issue is to understand when stated in probabilities:

In one study, Gigerenzer and his colleagues asked doctors in Germany and the United States to estimate the probability that a woman with a positive mammogram actually has breast cancer, even though she’s in a low-risk group […]:

The probability that one of these women has breast cancer is 0.8 percent. If a woman has breast cancer, the probability is 90 percent that she will have a positive mammogram. If a woman does not have breast cancer, the probability is 7 percent that she will still have a positive mammogram. Imagine a woman who has a positive mammogram. What is the probability that she actually has breast cancer?

Gigerenzer describes the reaction of the first doctor he tested, a department chief at a university teaching hospital with more than 30 years of professional experience:

“[He] was visibly nervous while trying to figure out what he would tell the woman. After mulling the numbers over, he finally estimated the woman’s probability of having breast cancer, given that she has a positive mammogram, to be 90 percent. Nervously, he added, ‘Oh, what nonsense. I can’t do this. You should test my daughter; she is studying medicine.’ He knew that his estimate was wrong, but he did not know how to reason better. Despite the fact that he had spent 10 minutes wringing his mind for an answer, he could not figure out how to draw a sound inference from the probabilities.”

When Gigerenzer asked 24 other German doctors the same question, their estimates whipsawed from 1 percent to 90 percent. Eight of them thought the chances were 10 percent or less, 8 more said 90 percent, and the remaining 8 guessed somewhere between 50 and 80 percent. Imagine how upsetting it would be as a patient to hear such divergent opinions.

As for the American doctors, 95 out of 100 estimated the woman’s probability of having breast cancer to be somewhere around 75 percent.

The right answer is 9 percent.

The twist in the story comes from how easy this is to get right if you phrase the exact same question in a “natural frequencies” format:

Eight out of every 1,000 women have breast cancer. Of these 8 women with breast cancer, 7 will have a positive mammogram. Of the remaining 992 women who don’t have breast cancer, some 70 will still have a positive mammogram. Imagine a sample of women who have positive mammograms in screening. How many of these women actually have breast cancer?

My question is whether this format can be adapted to the case of empirical testing of a theory. We have three main terms that need to be “adapted”:

Risk of false negatives – How likely is it that the theory will be rejected if it is actually true? Let us say this is quite unlikely (2%)
Risk of false positives – How likely is it that the theory will be supported if it is actually false? This depends on how “observationally equivalent” it is to the true theory. Take rational addiction theory as an example: One article argues that consumption with a trend often will test positive for rational addiction even though there is no rational, forward-looking planned change in tastes going on. I find trended consumption far more plausible, so let us put the likelihood of “trended consumpti0n or some other non-rational addiction mechanism is actually present and testing positive by mistake” at 40%
“Base-rate” – In medicine, this is the known prevalence of the disease in the population being tested. In our case it is not easily interpretable – but ask yourself , for instance, “how likely do I think it is that real junkies and cigarette smokers are gradually implementing a forward looking plan for changing their own tastes, and that this is the reason their use of cigarettes, heroin or whatever is gradually increasing?” Let us say we put this at 5%. This does sound both speculative and “science-fiction”ish, but could we interpret this as saying “of all the possible universes that would have unfolded consistently with our current history and experiences – in how many of these do we think real junkies and cigarette smokers [….]”?

If we think this sounds OK, we could try something along the lines of:

My feeling/guess is that only 20 out of 1000 universes we might be living in would have rational addicts. In all 20 of these universes rational addiction theory would do well in testing. Of the remaining 980 universes that do not contain rational addicts, some 392 will test positive. Imagine that our current test-results indicate that we live in one of the 412 universes that test positive for rational addiction. How likely is it that there really are rational addicts?

This is (I think) quite basic Bayesian updating, so the whole “new” thing here is the attempt to rephrase it in a way that makes the base-rate point obvious: After positive test-results, the likelihood that we are living in the rational addiction world would be 4,8% – higher than 2% (our starting estimate) – but still very low.

(Of course – you may quibble with the numbers I put on it – in fact, so would I – but they’re just there to have something to put into the format I was testing)

Wednesday, May 4, 2011

An escape from uncertainty? On the support for peer review and hierarchical journals

Some time ago, after discussing peer-review and mailing the first quote from yesterday’s post to a colleague, he responded “OK, so design a better system, then.”
The challenge has been bouncing around in the back of my head for a while, when one day it hit me that (maybe) it is an impossible task – because the perceived benefits of the current system are illusory, while an important benefit of an alternative system would be that it was more transparent and thus would not provide the illusion of authoritativeness, finality and certainty.
Imagine a place where all articles could be published – an online repository of some sort. There’s A LOT of researchers out there, and there would be a flood of papers in any (even narrowly defined) field. You might see which ones other readers have read, you might even have tools in the repository for giving “starred reviews” (as on Amazon) but with scholarly comments, for giving evaluations of reviewers, and thus maybe even average “ratings weighted by how “useful/valid/perceptive” the reviewers have been judged,” and so on. There could be long comment and discussion threads, the different articles could be cross-referenced by researchers and readers, the whole thing could be in a “facebook-ish” system that made it harder to be an anonymous troll.
Even so – I think this would prove unsatisfactory to many (most?) researchers: I think there’s a human desire for someone to have the final say and state that “Yes – this is good, important and probably true!” It is a desire to have some external authority that can make the final judgment call that “your work is good!” or that “This result can be cited with confidence!” An open, transparent system makes it hard not to see the apes behind the machine. The “institution” of peer-reviewed, prestigious journals, in comparison, has a somewhat magical aura of authoritativeness and gravitas.
Put differently – the present journal system makes it easy to identify which “giants” we should stand on the shoulders of to see further, and it offers the hope that we can be published in a high-ranking journal and thus be future giants ourselves. A truly open system shows us that we are trying to build on the shoulders of a large, shifting mass of more or less confused fellow ants all scrambling around trying to do the same thing.
I’m not sure how I can test this hunch – but if it’s correct then it will be difficult to move towards a more open access approach to science based primarily on post-review. Maybe it will change as new generations become more and more comfortable with on-line tools and evaluation methods, I don’t know. But my guess would be that you can marshal all the evidence you want against peer-review and tiered journals and it wouldn’t help. You could show that peer-review fails to catch errors, that referees are biased in favor of conclusions they like, that referees agree as often as two tossed coins, that it is a newfangled thing that was quite unusual in even top journals until the second half of the 20th century (think about it – they didn’t even have photocopiers in the “old days”), that a system of tiered journals creates publication bias in favor of spurious results, provides disincentives to replication studies, and so on and so forth.
Yes – a “top journal” may be just some guy acting as editor who gives two or three researchers access to an enormously impactful “Like-button,” but it doesn’t feel that way.

Tuesday, May 3, 2011

Peer review and transparency

There is some evidence that the status of your name or institution affects the conclusions of peer review:

There have been many studies of bias - with conflicting results - but the most famous was published in Behavioural and Brain Sciences [14]. The authors took 12 studies that came from prestigious institutions that had already been published in psychology journals. They retyped the papers, made minor changes to the titles, abstracts, and introductions but changed the authors’ names and institutions. They invented institutions with names like the Tri-Valley Center for Human Potential. The papers were then resubmitted to the journals that had first published them. In only three cases did the journals realise that they had already published the paper, and eight of the remaining nine were rejected - not because of lack of originality but because of poor quality. The authors concluded that this was evidence of bias against authors from less prestigious institutions.

The solution sometimes proposed is double-blind peer-review – where the referee does not know whose article he/she is reviewing – which is seen as a way of ensuring that famous names and well-known colleagues do not have an easier time getting published than others. Daniel Lemire discusses a paper that found double-blind peer-review to actually hurts “outsiders” more than it leveled the playing field. Criticism also became harsher, and the quality increase was marginal at best.

Lemire concludes that transparency is better – interestingly, he makes the transparency argument against both the blinds in the double-blind: He seems to argue both that the author should be known to the referee, and that the referee and the review report should be known to the author:

But the best way to limit the biases is transparency, not more secrecy. Let the world know who rejected which paper and for what reasons.

Freakynomics

Page list