Dubious Debiasing


Jacy Reese Anthis, Kristian Lum, Michael Ekstrand, Avi Feller, Alexander D'Amour, and Chenhao Tan. 2024. Dubious Debiasing: Inherent Challenges in Achieving Fairness in Large Language Models. In HEAL: Human-centered Evaluation and Auditing of Language Models at CHI 2024.


Researchers have evaluated fairness in machine learning with a variety of technical frameworks, such as group fairness and fair representations. With the increasingly complex ways in which generative AI interfaces with human society, it is not clear how these frameworks can be extended to general-purpose systems, such as ChatGPT, Gemini, and other large language models (LLMs). Despite the critical importance of evaluating LLM fairness, we articulate inherent challenges. In some cases, extant frameworks cannot be applied to human-LLM interaction, and in others, the notion of a fair LLM is intractable due to the exceptional flexibility of LLMs in performing many different types of tasks with effects on a multitude of diverse stakeholders, including widely varying user populations. We conclude with motivating principles for fairness in LLM systems that foreground the criticality of context, the responsibility of LLM developers, and the need for stakeholder involvement in an iterative process of design and evaluation.