All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness

Michael D. Ekstrand; Mucun Tian; Ion "Madrazo Azpiazu"; Jennifer D. Ekstrand; Oghenemaro Anuyah; David McNeill; Maria Soledad Pera

All The Cool Kids, How Do They Fit In?

Michael D. Ekstrand, Mucun Tian, Ion Madrazo Azpiazu, Jennifer D. Ekstrand, Oghenemaro Anuyah, David McNeill, and Maria Soledad Pera. 2018. All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (FAT* 2018), Feb 23, 2018. PMLR, Proceedings of Machine Learning Research 81:172–186. proceedings.mlr.press/v81/ekstrand18b.html. Acceptance rate: 24%. Cited 389 times. Cited 229 times.

Download PDF ～ Official Version

This paper is an extension of our RecSys 2017 poster.

Abstract

{#abstract}

In the research literature, evaluations of recommender system effectiveness typically report results over a given data set, providing an aggregate measure of effectiveness over each instance (e.g. user) in the data set. Recent advances in information retrieval evaluation, however, demonstrate the importance of considering the distribution of effectiveness across diverse groups of varying sizes; for example, do users of different ages or genders obtain similar utility from the system, particularly if their group is a relatively small subset of the user base? We apply this consideration to recommender systems, using offline evaluation and a utility-based metric of recommendation effectiveness to explore whether different user demographic groups experience similar recommendation accuracy. We find demographic differences in measured recommender effectiveness across two data sets containing different types of feedback in different domains; these differences sometimes, but not always, correlate with the size of the user group in question. Demographic effects also have a complex—and likely detrimental—interaction with popularity bias, a known deficiency of recommender evaluation. These results demonstrate the need for recommender system evaluation protocols that explicitly quantify the degree to which the system is meeting the information needs of all its users, as well as the need for researchers and operators to move beyond naïve evaluations that favor the needs of larger subsets of the user population while ignoring smaller subsets.

Resources

Talk

Work I cited:

Robin Burke. 2017. Multisided Fairness for Recommendation. arXiv [cs.CY]. Retrieved from http://arxiv.org/abs/1707.00093
Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, and Emine Yilmaz. 2017. Auditing Search Engines for Differential Satisfaction Across Demographics. In Proceedings of the 26th International Conference on World Wide Web Companion (WWW ’17 Companion), 626–633.
Limits of Social Data
Alejandro Bellogin. 2012. Performance prediction and evaluation in Recommender Systems: an Information Retrieval perspective. Ph.D thesis, Universidad Autónoma de Madrid, Madrid, Spain. Retrieved from http://ir.ii.uam.es/~alejandro/thesis/
FATREC

Listed Under

Fair Recommender Systems

Recorded Elsewhere

Version of Record