Fairness in Reviewing

Many computer science research communities are considering various changes to their reviewing processes to reduce bias, reduce barriers to participation, or accomplish other goals.

This week, Suresh Venkatasubramanian wrote about proposed changes in SODA to allow PC members to submit to the conference. There was a bunch of interesting discussion in the comments; this exchange in particular jumped out. Thomas Steinke said:

I completely disagree with the assertion that double-blinding is “a really easy solution” to conflicts of interest. It’s particularly ridiculous given that you are active in the FAT* and FATML community, which (to the best of my knowledge) fundamentally rejects the idea that bias can simply be removed by blindness to race/gender/etc.

To which Suresh responded:

Why this works differently compared to “fairness through bliindness” in automated decision making is something i have to ponder.

I have a few thoughts on this. I originally wrote up a version of this as a comment there, but a wrong button push deleted my comment. So I’ll write it up in more detail here, where I can include figures and have git to save the results.

First, a brief note on terminology — even though it is not near as widely used, I will refer to double-blind reviewing as ‘mutually anonymous’ and fairness-through-blindness as fairness-through-unawareness.

Fairness: Imperfect and Contextual

I want to begin with a couple of points about the pursuit of fairness. First, fairness in an unfair world will always be imperfect. As Suresh pointed out elsewhere, mutual anonymity achieves useful but limited outcomes in reducing implicit bias. It is not perfect, even on its own terms (it is often easy for experienced community members to guess authorship, though I expect this is less reliable than many raising this argument against mutually anonymous reviewing believe). However, given the empirical evidence that mutually anonymous reviewing reduces bias in decision outcomes, and the plausible mechanism of operation, it seems like a worthwhile endeavor. Further, given the incompatibility between fairness definitions, in many problem settings we will have arguable unfairness of one kind even if we achieve it perfectly under another definition.

Second, the tradeoffs and possibilities in the pursuit of fairness are contextual. Different problem settings have different causes and costs of unfairness, as well as different affordances for reducing or mitigating bias. The peer review process has significant impact on livelihoods and careers, but it is a different problem than loan decision making or hiring.

So it seems to me that ‘does fairness-through-unawareness work here but not there?’ is not the most productive way to approach the question. Rather, do the limitations and possibilities — or lack thereof — of fairness-through-unawareness represent an acceptable or optimal tradeoff here, but unacceptable elsewhere? I don’t have the answers, but I think contextualized tradeoffs will be better way to pursue clarity than bright-line answers.

Peer Review Fairness Goals

graph LR
  Author --> Quality
  Author --> Relevance
  Author --> Secondary
  Quality --> Acceptance
  Relevance --> Acceptance
  Secondary --> Acceptance
  Author --> Acceptance
Structural equation model of peer review.

To think about what we would like to achieve in making peer review more fair, and what possible interventions are available to us, it helps to look at a path model of the reviewing problem and its relevant variables.

One way to frame the problem of debiasing peer review is that we want acceptance to be independent of authorship. That is, Pr[Accept|Auth]=Pr[Accept]\mathrm{Pr}[\mathrm{Accept}|\mathrm{Auth}] = \mathrm{Pr}[\mathrm{Accept}], or at least that acceptance is independent of protected characteristics of the author(s) such as community connections or institutional prestige.

We can also reframe so that a paper should be accepted solely on the basis of its quality and relevance. This leads to a conditional independence view of the issue:

Pr[Accept|Qual,Rel,Auth]=Pr[Accept|Qual,Rel] \mathrm{Pr}[\mathrm{Accept}|\mathrm{Qual}, \mathrm{Rel}, \mathrm{Auth}] = \mathrm{Pr}[\mathrm{Accept}|\mathrm{Qual}, \mathrm{Rel}]

Ok, great. But what are the paths through which authorship can affect acceptance? This will help us better analyze possible levers for correcting them. If we accept my path model as sufficiently complete for useful discussion, there are four:

  • Through quality (Author → Quality → Acceptance). We don’t want to break the Quality → Acceptance link, since it is largely the point of peer review. We cannot do a lot about the Author → Quality link; authors with more experience are likely to write better papers, or at least papers that are perceived as better (though more on this later).

  • Through relevance (Author → Relevance → Acceptance). This has the same basic problems as quality. The author link is probably more pronounced here, though, as authors who have long experience in a particular community have a better read on what the community thinks is relevant, and how to sell their work as relevant, than newcomers. This is perhaps undesirable, but I also think it is likely unavoidable.

  • Through secondary characteristics (Author → Secondary → Acceptance). This is deliberately vague; it can include secondary characteristics that give away author identities, but also includes other things that aren’t quality or relevance but affect reviewer decisions.

  • Directly (Author → Acceptance). This is a clearly problematic effect.

Debiasing Levers

Mutually anonymous peer review deals with the direct influence of authorship on acceptance. That’s all it can affect; the indirect paths are all still present. It is imperfect, but available empirical data indicates it is useful.

What would a fairness-through-awareness approach to debiasing peer review look like? In an ideal world, it might look like discounting the effects of secondary characteristics while leaving the influence of quality and relevance untouched. I think it is extremely unlikely that such a targeted intervention is possible — fairness-through-awareness would likely affect quality and/or relevance judgements. Ideally, it would debias our assessment of quality or relevance, not change their influence on acceptance, but I also think that is unlikely in practice.

However, mutually anonymous reviewing processes are not the only mechanism change at our disposal. Clear reviewer instructions and — crucially — structured review forms can, I think, help reduce the influence of secondary characteristics. Structured review forms break the review judgement down into individual pieces, encouraging the reviewer to focus on specific aspects of the paper relevant to the decision process. Particularly good ones do this in a way that helps counteract bias, through things such as separating the standard to which a contribution should be held from the assessment of whether it meets that standard (CSCW did this at least one one year).

Quality and relevance are much more difficult, and as I said above, I don’t think we want to affect their influence on the accept/reject decision. However, it may still be possible to affect the influence of author characteristics on quality and relevance: I would love to see some good data, but I think revise-and-resubmit processes may be able to help authors whose initial submission doesn’t meet quality or relevance expectations get their paper over the bar. This isn’t perfect, as experienced authors will need to do less revision for publication and thus will be able to publish more papers with comparable resources, but it may help this influence pathway.

Conclusion

Mutually anonymous peer review is not perfect, but it does block one critical pathway by which author characteristics can affect acceptance decisions. I do not think that fairness-through-awareness offers superior debiasing capabilities in this context. Finally, there are additional changes to the reviewing process that, when combined with mutually anonymous review, can reduce the influence of other undesirable bias pathways.

I remain convinced that mutual anonymity is a better way to structure peer review for computer science conferences, and don’t think this represents a fundamental incompatibility with the known limitations of fairness-through-unawareness.