Patterns of Gender-Specializing Query Reformulation


Amifa Raj, Bhaskar Mitra, Michael D. Ekstrand, and Nick Craswell. 2023. Patterns of Gender-Specializing Query Reformulation. Short paper in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '23). Proc. SIGIR '23. DOI 10.1145/3539618.3592034. arXiv:2304.13129. NSF PAR 10423689. Acceptance rate: 25.12%.

This paper was led by my Ph.D student Amifa Raj during her internship at Microsoft.

A scatter plot showing the fraction of woman-related reformulations by total reformulation count, with points colored by the query's gender alignment in a word embedding; there is no discernible pattern such that the query's alignment predicts the gender reformulation.


Users of search systems often reformulate their queries by adding query terms to reflect their evolving information need or to more precisely express their information need when the system fails to surface relevant content. Analyzing these query reformulations can inform us about both system and user behavior. In this work, we study a special category of query reformulations that involve specifying demographic group attributes, such as gender, as part of the reformulated query (e.g., “olympic 2021 soccer results” → “olympic 2021 women‘s soccer results”). There are many ways a query, the search results, and a demographic attribute such as gender may relate, leading us to hypothesize different causes for these reformulation patterns, such as under-representation on the original result page or based on the linguistic theory of markedness. This paper reports on an observational study of gender-specializing query reformulations—their contexts and effects—as a lens on the relationship between system results and gender, based on large-scale search log data from Bing. We find that these reformulations sometimes correct for and other times reinforce gender representation on the original result page, but typically yield better access to the ultimately-selected results. The prevalence of these reformulations—and which gender they skew towards—differ by topical context. However, we do not find evidence that either group under-representation or markedness alone adequately explains these reformulations. We hope that future research will use such reformulations as a probe for deeper investigation into gender (and other demographic) representation on the search result page.

Listed Under