All The Cool Kids, How Do They Fit In?
2018. All The Cool Kids, How Do They Fit In?: Popularity and Demographic Biases in Recommender Evaluation and Effectiveness. In Proceedings of the 1st Conference on Fairness, Accountability and Transparency (FAT* 2018). PMLR, Proceedings of Machine Learning Research 81:172–186. Acceptance rate: 24%.Cited 4 times., , , , , , and .
This paper is an extension of our RecSys 2017 poster.
In the research literature, evaluations of recommender system effectiveness typically report results over a given data set, providing an aggregate measure of effectiveness over each instance (e.g. user) in the data set. Recent advances in information retrieval evaluation, however, demonstrate the importance of considering the distribution of effectiveness across diverse groups of varying sizes; for example, do users of different ages or genders obtain similar utility from the system, particularly if their group is a relatively small subset of the user base? We apply this consideration to recommender systems, using offline evaluation and a utility-based metric of recommendation effectiveness to explore whether different user demographic groups experience similar recommendation accuracy. We find demographic differences in measured recommender effectiveness across two data sets containing different types of feedback in different domains; these differences sometimes, but not always, correlate with the size of the user group in question. Demographic effects also have a complex—and likely detrimental—interaction with popularity bias, a known deficiency of recommender evaluation. These results demonstrate the need for recommender system evaluation protocols that explicitly quantify the degree to which the system is meeting the information needs of all its users, as well as the need for researchers and operators to move beyond naïve evaluations that favor the needs of larger subsets of the user population while ignoring smaller subsets.
- Abstract and PDF at PMLR
- Preprint PDF
- Talk video
- Reproduction scripts
- Talk slides
- MovieLens data sets
- Last.FM data sets
Work I cited:
- Robin Burke. 2017. Multisided Fairness for Recommendation. arXiv [cs.CY]. Retrieved from http://arxiv.org/abs/1707.00093
- Rishabh Mehrotra, Ashton Anderson, Fernando Diaz, Amit Sharma, Hanna Wallach, and Emine Yilmaz. 2017. Auditing Search Engines for Differential Satisfaction Across Demographics. In Proceedings of the 26th International Conference on World Wide Web Companion (WWW ’17 Companion), 626–633.
- Limits of Social Data
- Alejandro Bellogin. 2012. Performance prediction and evaluation in Recommender Systems: an Information Retrieval perspective. Ph.D thesis, Universidad Autónoma de Madrid, Madrid, Spain. Retrieved from http://ir.ii.uam.es/~alejandro/thesis/