NSF CAREER award on recommenders, humans, and data
In 2018, I was awarded the NSF CAREER award to study how recommender systems and our evaluations of them respond to the human messiness of their input data. Us computer scientists have long known the principle of ‘garbage in, garbage out’: with bad data, a system will produce bad outputs. But in practice, computing systems can differ a great deal in precisely how they translate such inputs to outputs.
Our goal in this project is to understand that response — to characterize the ‘garbage response curve’ of common recommendation algorithms and surrounding statistical and experimental techniques. For a given type and quantity of garbage (metric/intent mismatch, discriminatory bias, polarized content), we want to understand its impact on recommendations, subsequent human behavior, and the information experiments provide to operators of recommender systems.
Systems that recommend products, places, and services are an increasingly common part of everyday life and commerce, making it important to understand how recommendation algorithms affect outcomes for both individual users and larger social groups. To do this, the project team will develop novel methods of simulating users' behavior based on large-scale historical datasets. These methods will be used to better understand vulnerabilities that underlying biases in training datasets pose to commonly-used machine learning-based methods for building and testing recommender systems, as well as characterize the effectiveness of common evaluation metrics such as recommendation accuracy and diversity given different models of how people interact with recommender systems in practice. The team will publicly release its datasets, software, and novel metrics for the benefit of other researchers and developers of recommender systems. The work also will inform the development of course materials about the social impact of data analytics and computing as well as outreach activities for librarians, who are often in the position of helping information seekers understand the way search engines and other recommender systems affect their ability to get what they need.
The work is organized around two main themes. The first will quantify and mitigate the popularity bias and misclassified decoy problems in offline recommender evaluation that tend to lead to popular, known recommendations. To do this, the team will develop simulation-based evaluation models that encode a variety of assumptions about how users select relevant items to buy and rate and use them to quantify the statistical biases these assumptions induce in recommendation quality metrics. They will calibrate these simulations by comparing with existing data sets covering books, research papers, music, and movies. These models and datasets will help drive the second main project around measuring the impact of feature distributions in training data on recommender algorithm accuracy and diversity, while developing bias-resistant algorithms. The team will use data resampling techniques along with the simulation models, extended to model system behavior over time, to evaluate how different algorithms mitigate, propagate, or exacerbate underlying distributional biases through their recommendations, and how those biased recommendations in turn affect future user behavior and experience.
Published Papers and Outputs
2023. Much Ado About Gender: Current Practices and Future Recommendations for Appropriate Gender-Aware Information Access. To appear in ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR '23)., , , and .
2022. Matching Consumer Fairness Objectives & Strategies for RecSys. Presented at the 5th FAccTrec Workshop on Responsible Recommendation (peer-reviewed but not archived).and .
2022. Measuring Fairness in Ranked Results: An Analytical and Empirical Comparison. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '22). pp. 726–736. Cited 4 times. Cited 3 times.and .
2022. The Multisided Complexity of Fairness in Recommender Systems. AI Magazine 43(2) (June 2022), 164–176. Cited 4 times. Cited 3 times., , , and .
2021. Evaluating Recommenders with Distributions. At Proceedings of the RecSys 2021 Workshop on Perspectives on the Evaluation of Recommender Systems (RecSys '21)., , and .
2021. Baby Shark to Barracuda: Analyzing Children’s Music Listening Behavior. In RecSys 2021 Late-Breaking Results (RecSys '21). Cited 1 time. Cited 1 time., , , , , , and .
2021. Pink for Princesses, Blue for Superheroes: The Need to Examine Gender Stereotypes in Kids’ Products in Search and Recommendations. In Proceedings of the 5th International and Interdisciplinary Workshop on Children & Recommender Systems (KidRec '21), at IDC 2021. Cited 3 times. Cited 5 times., , and .
2021. Estimation of Fair Ranking Metrics with Incomplete Judgments. In Proceedings of The Web Conference 2021 (TheWebConf 2021). ACM. Acceptance rate: 21%. Cited 20 times. Cited 21 times., , , , , and .
2021. Exploring Author Gender in Book Rating and Recommendation. User Modeling and User-Adapted Interaction 31(3) (February 2021), 377–420. Cited 114* times.and .
2020. LensKit for Python: Next-Generation Software for Recommender Systems Experiments. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM '20, Resource track). ACM, pp. 2999–3006. No acceptance rate reported. Cited 30 times. Cited 59* times..
2020. Evaluating Stochastic Rankings with Expected Exposure. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM '20). ACM, pp. 275–284. Acceptance rate: 20%. Nominated for Best Long Paper. Cited 83 times. Cited 89 times., , , , and .
2020. Estimating Error and Bias in Offline Evaluation Results. Short paper in Proceedings of the 2020 Conference on Human Information Interaction and Retrieval (CHIIR '20). ACM, 5 pp. Acceptance rate: 47%. Cited 5 times. Cited 7 times.and .
2019. StoryTime: Eliciting Preferences from Children for Book Recommendations. Demo recorded in Proceedings of the 13th ACM Conference on Recommender Systems (RecSys '19). 2 pp. Cited 1 time. Cited 8 times., , , , , and .
2018. Monte Carlo Estimates of Evaluation Metric Error and Bias. Computer Science Faculty Publications and Presentations 148. Boise State University. Presented at the REVEAL 2018 Workshop on Offline Evaluation for Recommender Systems, a workshop at RecSys 2018. Cited 1 time. Cited 1 time.and .
We have given a tutorial on fairness in IR & recommendation in multiple settings.
2019. Fairness and Discrimination in Recommendation and Retrieval. Tutorial presented at Proceedings of the 13th ACM Conference on Recommender Systems (RecSys '19). 2 pp., , and .
2019. Fairness and Discrimination in Retrieval and Recommendation. Tutorial presented at Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '19). 2 pp., , and .
I am one of the organizers of the TREC Fairness Track; my participation in this is funded by the grant.
2019. FACTS-IR: Fairness, Accountability, Confidentiality, Transparency, and Safety in Information Retrieval. SIGIR Forum 53(2) (December 2019), 20–43. Cited 4 times. Cited 14 times., , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and .
2019. Workshop on Fairness, Accountability, Confidentiality, Transparency, and Safety in Information Retrieval (FACTS-IR). In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '19). ACM. Cited 6 times. Cited 24 times., , , and .
These papers were written prior to the project period, and establish preliminary results that helped secure the grant.
2018. Exploring Author Gender in Book Rating and Recommendation. In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18). ACM, pp. 242–250. Acceptance rate: 17.5%. Cited 109 times. Citations reported under UMUAI21*., , , , and .
2017. Sturgeon and the Cool Kids: Problems with Random Decoys for Top-N Recommender Evaluation. In Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference (Recommender Systems track). AAAI, pp. 639–644. No acceptance rate reported. Cited 8 times. Cited 13 times.and .
I have three planned educational activities as a part of this project:
- Work with Don Winiecki and Boise State CS faculty to incorporate material on ethics and the social impact of technology into graduate artificial intelligence and data science classes.
- Collaborate with Eric Lindquist from the Boise State School of Public Service to develop and teach Big Data in Public Life, an interdisciplinary undergraduate course on the interaction of big data, ethics, and policy as data-driven algorithmic systems are increasingly deployed in our society in both public and private sectors.
- Develop training materials and teach workshops for librarians across Idaho on recommender systems and related technology, so they can make better use of it in working with their communities and provide their patrons with guidance in engaging with recommenders. Meridian Library District will be working with me on the pilot of these workshops.
See the Library Training page for details on library training, and on scheduling a training for your library.
I have given the following:
- Presentation at the Idaho Library Association 2019 Conference