Peer Review: Quality Control?

(Updated August 10, 2002)

The fallibility of Peer reviews can have severe consequences as demonstrated by Enron’s bankruptcy. How is this related to peer-reviews? Enron’s crisis was hidden from the public by unusual accounting practices by an accounting firm (“From Sunbeam to Enron, Anderson’s Reputations Suffers,” The New York Times, November 23, 2000). Does anybody supervise these accounting practices? Yes, adherence to accounting standards is supervised by peer-reviews by other accounting firms. Now, “the Public Oversight Board (POB) is charged with oversight of the accounting profession's self-regulatory peer review process in the United States.” (NYSSCPA.ORG NEWSBRIEF; is anybody in charge of overseeing the self-regulatory peer-review process in psychology?). Fortunately, bad peer-reviews in psychology do not leave thousands of employees without a pension, but they are annoying nonetheless.

We all have our share of stories about bad reviews by our so-called peers (colleagues, friends). And we all have witnessed glaring inconsistencies in reviews of the same paper (Petty, Fleming, & Fabrigar, 1999), which editors happily ignore as long as the reviews gives them an opportunity to reject the ms. A survey by Bradley (1981) found that 73% of academic psychologists had received reviews that contained obvious factual mistakes (see also Bornstein, 1990, American Psychologist, Vol. 45, No. 5, 672-673). Although everybody seems to agree that “the manuscript review system in psychology leaves much to be desired” (Bornstein, 1990), none of the recommendations for improvement have been implemented.

A while ago, I received a rejection like this. As the current review process does not allow me to reason with the editor or the anonymous reviewers, I felt the need to share this experience with others in one more attempt to demonstrate the need to revolutionize publishing and to radically change the peer review process.

These reviews were received in response to the ms. ‘Pleasure and displeasure in reaction to conflicting picture pairs: Examining appealingness and appallingness appraisals,” which you can read online now before it will be published eventually somewhere in 2 or 3 years (DOC).

Reviewer A: Although this paper is executed in a professional manner (rich bibliography, several experiments, etc.), there is really little more than a tedious demolition of a straw man… Nobody in their right mind would disagree with the basic contentions of this paper (e.g., that the feelings one experiences when viewing contrasting stimuli hinge on the relative attention that is devoted to each).

It is sad and amusing (mixed feelings!) to compare this review to Reviewer B’s comments.

Reviewer B: DATE [the model that make the obvious predictions according to Reviewer A] has critical flaws at both the conceptual and empirical levels…. This model cannot account for many findings in the literature, including affective reactions to stimuli that are presented unconsciously and thereby receive no attention (e.g., Ohman & Soares, 1994).

In a rational world, it is not possible for a theory to be both so obviously true that any empirical demonstration of it is ridiculous (Reviewer A) and at the same time to be conceptually flawed and inconsistent with many empirical facts (Reviewer B). However, this does not stop the editor to agree with both reviewers and to reject the ms.

Editor: My reading of your paper places me in general agreement with their [the reviewers] conclusions.

I assume – based on many conversations with colleagues - that we all suffer from similar experiences with the review process (and maybe we are sometimes the thoughtless reviewers). I personally find these experiences demotivating and alienating. I think we need to reconsider the validity of the review process and implement guidelines for writing of reviews (we do have now the 5th edition of publishing guidelines that tell us where to put a comma and were not; but there exist no guidelines whatsoever that set standards for a review. For example, how often have you received a review that claims that you missed important references without mentioning the particular references that you missed?)

New (and not so new) Ideas

1. Authors must be able to defend themselves against false claims by reviewers (see also Bornstein, 1990). If authors can make convincing arguments that a reviewer made a factual error in his or her review, the author should be allowed to request a new review and the false review should not influence the editors’ decision. Further, editors should keep track records of reviewers’ expertise. If a reviewer accumulates a sufficient number of invalid reviews, the reviewer should no longer be considered an expert and he or she should be excluded from the review process. Like many rules, the mere existence of such a rule should increase the quality of reviews. Of course, it is difficult to evaluate sometimes whether a certain statement is true or false, but I am sure that most of us have examples of evident and blatant mistakes in reviewers’ comments.

2. Create accountability in the review process. Currently, nobody holds reviewers accountable for the quality of their reviews. This needs to be changed to improve the quality of reviews. I suggest the following approach, which can be easily implemented in the current review process. It is already common practice for reviewers to receive the action letter of the editor and the other reviews, presumably to learn from consistencies and inconsistencies. Many of you may have wondered at times whether you were reviewing the same manuscript as the other reviewers, as reviews are rarely consistent (Petty et al., 1999). Many of you may also have noticed glaring differences in the level of detail and quality of the review. I suggest using this stage of the review process to add a review of the reviews. Each reviewer evaluates the other reviews on a few dimensions (e.g., detail, correctness, constructiveness, overall quality). The editor maintains a database with evaluations of reviewers. The data can be used to avoid reviewers with bad ratings, to give reviewers feedback about the quality of their reviews. The ratings may even be published and they may be used to give awards for high quality reviews. The counterargument is usually that nobody would want to review, if reviews were evaluated. In other words, we need to create an incentive for writing good reviews. I suggest that journal editors would need to demonstrate a track record of their ability to judge peers’ work objectively and fairly, and an ability to discriminate between high impact and lower impact manuscripts. Without an incentive for good reviews, we do not have to wonder why the quality of current reviews is so low. Peer-review in its present form is a lot like socialism competing against capitalism. We all can spend an afternoon writing a manuscript, which is rewarded or writing a review, which is not rewarded. What would you choose?

In the future I will post more ideas about how to change the publishing and review process and I would encourage you to share stories about irrational peer reviews and ideas how to change the process. Ultimately, most of us are genuinely interested in advancing psychology, but it is hard to maintain this goal in the face of superficial and erroneous feedback from your peers.

Feedback: uli.schimmack@utoronto.ca

 

Interreviewer Disagreement

Petty, Fleming, and Fabrigar (PSPB, 1999) published the interreviewer agreement of reviewers for Personality and Social Psychology Bulletin, a top journal in the field of Personality and Social Psychology. They report an interreviewer correlation of .29 for the recommendation. Reviewers could not even agree on the quality of the literature review (interrater correlation .25), and they were lest likely to agree on the overall importance of a submitted manuscript (interrater agreement, r = .19). These results are consistent with other journals. A review by Marsh and and Ball (1989; cited in Petty et al., 1999) reported an average interreviewer aggrement of .27). With two reviewers, the average evaluation has a reliability of .42; with three reviewers the reliability is .52. Typically, a reliability of .70 is regarded as the minimum for the use of tests in personality assessment. The unreliability of the review process has one major implication: You need to send all your manuscripts to the top journals to get in. It takes Terrell Brandon 10 attempts to make 10 free-throw; it takes Shaquille O’Neal 20 attempts to get 10 free-throws; but you need to get to the foul-line first to get a chance.