In addition to showing poor interrater agreement, reviewers were strongly biased against manuscripts which reported results contrary to their theoretical perspective. Variables such as the author's prestige and institutional affiliation may significantly influence a reviewer's recommendation. With identical experimental procedures, a positive results manuscript was rated as methodologically better than one reporting negative results.
Negative results earned a significantly lower evaluation, with the average reviewer urging either rejection or major revision. (a) referee evaluations may be dramatically influenced by such factors as experimental outcome, and (b) inter-referee agreement may be extremely low on factors relating to manuscript evaluation.
More embarrassing, perhaps, is the realization that we have developed elaborate standards for evaluating various psychological instruments and yet have exempted the most pervasive and critical instrument in science--i.e., the scientist.
Without further scrutiny of the purposes and processes of peer re- view, we are left with little to defend it other than tradition. Gillett (1993)
The theoretical arguments against the use of peer review for assessing innovative scientific proposals now date back more than 30 years. In that time little has been published to attempt to refute Kuhn's work and much has been published to support it. Remarkably, however, these arguments have largely failed to influence the practicalities of peer review. Nevertheless, this does not undermine their validity. Peer review for funding decisions on new science remains open to question philosophically. A convincing case can be made that heavy reliance on peer review is failing the scientific community, Kassirer and Campion (1994) Yet we have never tried to define the relative gravity of the various faults detected by peer review, and no one has come to grips with how they should be weighed in the evaluation of manuscripts. We may just have to admit that the process we use to assess sophisticated scientific research is crude.
Godlee, Gale and Martyn (1998)
The mean number of weaknesses commented on was 2 (out of 8) Neither blinding reviewers to the authors and origin of the paper nor requiring them to sign their reports had any effect on rate of detection of errors. Such measures are unlikely to improve the quality of peer review reports.
Very few statistically reliable differences in study validity were found between studies published in peer reviewed journals and studies not published in such journals.
Rothwell and Martin (2000)
Agreement between reviewers in clinical neuroscience was little greater than would be expected by chance alone.
It has been shown that reviewers recommended by authors themselves give much more favourable assessments than those chosen by editors
Neither blinding reviewers to the authors and origin of the paper nor requiring them to sign their reports appear to have any effect on the quality of peer review many researchers already spend as much time participating in peer review as they spend doing research
Finally, many of the arguments that apply to peer review for journal articles and conference abstracts
also apply to peer review of grant applications
Jefferson et al. (2002)
Editorial peer review, although widely used, is largely untested and its effects are uncertain
Funding Peer Review approaches
So in general, research funding proposals have two main components- the technical proposal, which is often quite similar to a paper (albeit without all the results!), and the costs. Most funding agencies are structured a little like a journal, with a set of programme officers (who may be seconded from academia or other research organisations) and a collection of reviewers. A proposal is sent out for review by several people, generally peers of the proposer (i.e. domain experts), and most of their review concerns the novelty of the proposed work, although a small component might include the track record of the proposer and some comments on resources requested. The reviews come back and go to a panel, composed of some of those experts and some others, who then rank all the proposals, perhaps allowing for inaccuracies in any reviews or removing unfair comments, and then the funding agency can allocate money starting at the top going down all the "fundable" proposals until they run out of proposals (or money, more likely). Any remaining unfunded proposals that are good can be "forwarded" to a later panel.
In some systems, the reviews are sent out to the proposers so they can respond (i.e. a "rebuttal" phase) to correct any misunderstandings, although ,unlike a journal paper, they can't change the proposal usually (oddly, if you ask me - although some agencies do allow modification - e..g DARPA in the US sometimes). Sometimes, the funding may be approved at a lower level. Some EU schemes include a discussion phase about modifying the proposal to do more or less work for less or more money!
In my experience (including the DFG/Germany, CNRS/France, NSF/US), ERC/EU, and quite a few others this is fairly uniform.
So far so good.
RCUK (e.g. the EPSRC) differs in some ways which may matter. Firstly, unlike most other agencies, do not have expertise "in house". THus the review assignments depend on how good their database is, and there's no filtering of poor quality reviews (the way an editor might). Secondly, the panel does not consist of reviewers and does not review the proposal, even though it does consist of peers. Reviewers do not see each others reviews (unlike many conference and journal review systems, where reviewers can learn to calibrate themselves and even correct misunderstandings before feedback goes to authors).
There's no evidence that the proposals selected by UK differ in quality from proposals selected by other counties' agencies, although there may be questions about the review quality and feedback processes (part of peer review is obviously the peer group's confidence in the system).
Most systems are "single blind" where reviewers know proposers (authors) - some conferences are double blind (common in computer science conferences) - there's little evidence whether this increases fairness either way.
I guess as with many peer review systems, fairness/transparency are important. However, demoracy isn't part of the picture - by definition, this is an elitist system.
What could change?
In publications, we've talked about merging journals and conferences so that there's a constant round of submission/revision/publication/presentation...perhaps conferences (attendees) would vote to decide which papers they want to hear presented, a form of crowd sourcing of peer review. SYstems like PNAS, SSRN and ArXiv operate compltely openly and unfunded and are an increasing part of the publication landscape. The equivalent for funding would be interesting!
In funding there's also the idea of crowd-funding (kickstarter for research!) which would crowdsource reviews too. Enlisting expertise at all stages of the process helps. Some conferences, and some journals, and some funding agencies (ERC for example) have multiple tiers of reviewers to call on, with lists of lots of very specialist people only called on occasionallly. Recompensing reviewers happens in some systems (curiously not in pay-per-view journals!)
There are quite a few other ideas kicking around.