Wednesday, March 16, 2011

Does peer review ensure quality?

I’ve written (somewhat unsystematically) on peer review and academic journals lately, especially on the question of why we have a hierarchy of journals (top journals, mid-tier etc.). I suggested some possible justifications for this, but they all rest on the assumption that editors and referees are good at estimating the “quality” of research (in the sense of long term importance). A reader suggested that having top journals could motivate researchers to do better work than otherwise, but this too requires that it is a relevant sense of scientific quality that determines acceptance in top journals. (This benefit mainly comes about if this system identifies and “marks” quality better than alternative systems focused on citation rates etc. Otherwise, it is mostly an early (noisy) estimate of long term importance that allows you to reap an expected status earlier than otherwise.)

Anyway – here’s some stuff I’ve gathered lately on the quality of refereeing. Mostly, this comes from bloggers sceptical of the current system – please add more positive stuff if you know of it.

First, from Cameron Neylon at Science in the Open

what evidence we do have shows almost universally that peer review is a waste of time and resources and that it really doesn’t achieve very much at all. It doesn’t effectively guarantee accuracy, it fails dismally at predicting importance, and its not really supporting any effective filtering.  If I appeal to authority I’ll go for one with some domain credibility, lets say the Cochrane Reviews which conclude the summary of a study of peer review with “At present, little empirical evidence is available to support the use of editorial peer review as a mechanism to ensure quality of biomedical research.” Or perhaps Richard Smith, a previous editor of the British Medical Journal, who describes the quite terrifying ineffectiveness of referees in finding errors deliberately inserted into a paper. Smith’s article is a good entry into to the relevant literature as is a Research Information Network study that notably doesn’t address the issue of whether peer review of papers helps to maintain accuracy despite being broadly supportive of the use of peer review to award grants.

I (very briefly) looked at Neylon’s links (hope to go more in-depth another time), but the Cochrane Review mostly notes that there is no evidence that peer review raises quality. That is, it is more a lack of evidence either way than evidence in one direction. Smith’s article, on the other hand, is more aggressive:

If peer review is to be thought of primarily as a quality assurance method, then sadly we have lots of evidence of its failures. The pretentiously named medical literature is shot through with poor studies. John Ioannidis has shown how much of what is published is false [4]. The editors of ACP Journal Club search the 100 'top' medical journals for original scientific articles that are both scientifically sound and important for clinicians and find that it is less than 1% of the studies in most journals [5]. Many studies have shown that the standard of statistics in medical journals is very poor [6].

[…]

While Drummond Rennie writes in what might be the greatest sentence ever published in a medical journal: 'There seems to be no study too fragmented, no hypothesis too trivial, no literature citation too biased or too egotistical, no design too warped, no methodology too bungled, no presentation of results too inaccurate, too obscure, and too contradictory, no analysis too self-serving, no argument too circular, no conclusions too trifling or too unjustified, and no grammar and syntax too offensive for a paper to end up in print.'

(BTW: I would recommend this readable article that provides a nice introduction on Ionnides)

The usually interesting Robin Hanson noted a recent study that shows that not much of the variability in referees’ ratings is explained by a tendency to agree (which tells you that the “signal” of quality is (if it is there) very noisy):

reviewreliability

The above is from their key figure, showing reliability estimates and confidence intervals for studies ordered by estimated reliability. The most accurate studies found the lowest reliabilities, clear evidence of a bias toward publishing studies that find high reliability. I recommend trusting only the most solid studies, which give the most pessimistic (<20%) estimates.

Some years ago, Hanson also uncovered a study from the “good-old days” when you were allowed to mislead study subjects if this was necessary to get valid results. This study showed that reviewers do not agree, and by manipulating the conclusions and creating studies that were equivalent methodologically but supported different conclusions, they also showed that referee opinions regarding manuscripts were

strongly biased against manuscripts which reported results contrary to their theoretical perspective.

Ideally, we should have an external measure of quality that is reasonably independent of “bandwagon” effects (e.g., if everyone wants to cite American Economic Review because it is the top journal, then citation rates of articles from AER would tend to reflect the status of the journal rather than the quality of the articles). Seems to me like this would be easier to find in a more decisively empirical discipline than economics (or sociology for that matter), where long-term citation rates might more credibly reflect empirically valid and important results and theories.