The annoyance I want to note today is one that disappoints me. Berg and Gigerenzer write:

Behavioral models frequently add new parameters to a neoclassical model, which necessarily increases R-squared. Then this increased R-squared is used as empirical support for the behavioral models without subjecting them to out-of-sample prediction tests.

This is silly. Yes, adding a parameter does increase R-squared (the share of the variation in the data that your statistical model captures), but this way of phrasing it makes it sound as though any variable added to a statistical model would increase R-squared by the same amount. That´s not the case: A randomly picked variable that is irrelevant would (if we ignore time trends and that sort of data) on average have zero explanatory power. The standard test is to check the significance level of the variable. This answers the following question: If the variable actually has no explanatory power for the data - how likely is it that it would “by chance” seem to explain whatever it seems to explain in the current dataset? The normal significance level to test at is 5%, and if you use that significance level the “irrelevant” variable will seem relevant in your data only 5% of the time. I´m pretty sure Berg and Gigerenzer know this.

A related flaw shows up in their discussion of Fehr and Schmidt´s model of inequality aversion (which assumes that some people dislike inequality, especially inequality in their own disfavor). Berg and Gigerenzer write:

In addition, the content of the mathematical model is barely more than a circular explanation: When participants in the ultimatum game share equally or reject positive offers, this implies non-zero weights on the “social preferences” terms in the utility function, and the behavior is then attributed to “social preferences.”

This, too, is weak. What Fehr and Schmidt´s model assumes is that there is a specific structure to the inequity aversion: That your dislike of how much better (or worse) off someone else is than you is a linear function of

*how much*better off than you they are. And, second, that it´s worse being behind someone than in front of someone, even if you´d prefer most of all that you were equal. It may be this model is "wrong," but it is more than circular and there is a variety of competing models that others have promoted as better ways of capturing typical patterns in experimental data on various economic games (off the top of my head, Charness and Rabin (2002), Bolton and Ockenfels (2000) and Engelmann and Strobel (2004)).

Having said that, it might well be that Fehr and Schmidt is a crude model that fails to capture and process the relevant data in the best way. However, it does so well enough to be useful and interesting. If you found a model that did better and that could also predict well for new experiments, as well as in different settings - using less information that could more credibly be related to actual pscyhological processes - then I´m pretty sure you would be published quickly in a good journal. That´s not to say that “you shouldn´t criticize unless you can do better,” but it is to say that the current model captures

*something*interesting in a simple way - even if it is clearly imperfect. Clarifying its weaknesses is fair game - but Berg and Gigerenzer should do better than brushing it off as though its fit with data was no better than any random model thrown up.