Research Methods and Infrastructure

I have a number of previous and ongoing projects to improve recommender systems research methods and infrastructure to support research. This has most notably resulted in the LensKit software, an open-source toolkit for recommender systems research, and more recently the new POPROX project to build online infrastructure for user-facing recommender systems research.

Funding

2023–2025: NSF award 2232553, $150K: Collaborative Research: CCRI: New: A Research News Recommender Infrastructure with Live Users for Algorithm and Interface Experimentation
2018–2024: NSF award 1751278, $482,081: CAREER: User-Based Simulation Methods for Quantifying Sources of Error and Bias in Recommender Systems
Texas State University Research Enhancement Program: Temporal Evaluation of Recommender Systems ($8000, 2015)

POPROX

POPROX is a new project to develop a news recommendation platform that will serve as shared infrastructure to support academic research on recommender systems with actual user responses. It’s just kicking off, and should be ready for experiments in 2024.

I am actively recruiting a Ph.D student for this project.

LensKit

LensKit is an open-source toolkit supporting recommender systems research and education. Originally released for Java in 2010, I rewrote it in Python in 2018. It has been used to support dozens of published papers.

Papers

Tobias Vente, Michael Ekstrand, and Joeran Beel. 2023. Introducing LensKit-Auto, an Experimental Automated Recommender System (AutoRecSys) Toolkit. Demo recorded in Proceedings of the 17th ACM Conference on Recommender Systems (RecSys ’23), Sep 18–22, 2023. pp. 1212–1216. DOI 10.1145/3604915.3610656. Cited 16 times. Cited 11 times.

Michael D. Ekstrand. 2020. LensKit for Python: Next-Generation Software for Recommender Systems Experiments. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ’20, Resource track), Oct 21, 2020. ACM, pp. 2999–3006. DOI 10.1145/3340531.3412778. arXiv:1809.03125 [cs.IR]. NSF PAR 10199450. No acceptance rate reported. Cited 134 times. Cited 83 times.

Michael D. Ekstrand and Michael Ludwig. 2016. Dependency Injection with Static Analysis and Context-Aware Policy. Journal of Object Technology 15(1) (February 2016), 1:1–31. DOI 10.5381/jot.2016.15.1.a1. Cited 17 times.

Michael D. Ekstrand. 2014. Towards Recommender Engineering: Tools and Experiments in Recommender Differences. Ph.D thesis, University of Minnesota. HDL 11299/165307. Cited 8 times. Cited 4 times.

Michael D. Ekstrand. 2014. Building Open-Source Tools for Reproducible Research and Education. At Sharing, Re-use, and Circulation of Resources in Cooperative Scientific Work, a workshop at CSCW 2014, Feb 15, 2014.

Michael D. Ekstrand, Michael Ludwig, Joseph A. Konstan, and John T. Riedl. 2011. Rethinking The Recommender Research Ecosystem: Reproducibility, Openness, and LensKit. In Proceedings of the Fifth ACM Conference on Recommender Systems (RecSys ’11), Oct 24, 2011. ACM, pp. 133–140. DOI 10.1145/2043932.2043958. Acceptance rate: 27% (20% for oral presentation, which this received). Cited 263 times. Cited 199 times.

Michael D. Ekstrand, Michael Ludwig, Jack Kolb, and John T. Riedl. 2011. LensKit: A Modular Recommender Framework. Demo recorded in Proceedings of the 5th ACM Conference on Recommender Systems (RecSys ’11), Oct 27, 2011. ACM, pp. 349-350. DOI 10.1145/2043932.2044001. Cited 48 times. Cited 2 times.

Evaluation Practice

I have also done a variety of work on evaluating recommender systems, in addition to our work on fairness.

Papers

Fernando Diaz, Michael D. Ekstrand, and Bhaskar Mitra. 2025. Recall, Robustness, and Lexicographic Evaluation. Transactions on Recommender Systems 4(1) (July 2025; online Apr 3, 2025), 13:1–50. DOI 10.1145/3728373. arXiv:2302.11370. Cited 4 times. Cited 3 times.

Ngozi Ihemelandu and Michael D. Ekstrand. 2024. Multiple Testing for IR and Recommendation System Experiments. Short paper in Proceedings of the 46th European Conference on Information Retrieval (ECIR ’24), Mar 24–28, 2024. Lecture Notes in Computer Science 14610:449–457. DOI 10.1007/978-3-031-56063-7_37. NSF PAR 10497108. Acceptance rate: 24.3%. Cited 3 times.

Ngozi Ihemelandu and Michael D. Ekstrand. 2023. Candidate Set Sampling for Evaluating Top-N Recommendation. In Proceedings of the 22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT ’23), Oct 26–29, 2023. pp. 88-94. DOI 10.1109/WI-IAT59888.2023.00018. arXiv:2309.11723 [cs.IR]. NSF PAR 10487293. Acceptance rate: 28%. Cited 6 times. Cited 1 time.

Michael D. Ekstrand, Ben Carterette, and Fernando Diaz. 2024. Distributionally-Informed Recommender System Evaluation. Transactions on Recommender Systems 2(1) (March 2024; online Aug 4, 2023), 6:1–27. DOI 10.1145/3613455. arXiv:2309.05892 [cs.IR]. NSF PAR 10461937. Cited 18 times. Cited 11 times.

Ngozi Ihemelandu and Michael D. Ekstrand. 2023. Inference at Scale: Significance Testing for Large Search and Recommendation Experiments. Short paper in Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23), Jul 23, 2023. pp. 2087–2091. DOI 10.1145/3539618.3592004. arXiv:2305.02461. NSF PAR 10423691. Acceptance rate: 25.1%. Cited 4 times. Cited 1 time.

Ngozi Ihemelandu and Michael D. Ekstrand. 2021. Statistical Inference: The Missing Piece of RecSys Experiment Reliability Discourse. In Proceedings of the Perspectives on the Evaluation of Recommender Systems Workshop 2021 (RecSys ’21), Sep 25, 2021. arXiv:2109.06424 [cs.IR]. Cited 9 times. Cited 7 times.

Michael D. Ekstrand, Ben Carterette, and Fernando Diaz. 2021. Evaluating Recommenders with Distributions. In Proceedings of the RecSys 2021 Workshop on Perspectives on the Evaluation of Recommender Systems (RecSys ’21), Sep 25, 2021. Cited 2 times.

Fernando Diaz, Bhaskar Mitra, Michael D. Ekstrand, Asia J. Biega, and Ben Carterette. 2020. Evaluating Stochastic Rankings with Expected Exposure. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ’20), Oct 21, 2020. ACM, pp. 275–284. DOI 10.1145/3340531.3411962. arXiv:2004.13157 [cs.IR]. NSF PAR 10199451. Acceptance rate: 20%. Nominated for Best Long Paper. Cited 213 times. Cited 181 times.

Mucun Tian and Michael D. Ekstrand. 2020. Estimating Error and Bias in Offline Evaluation Results. Short paper in Proceedings of the 2020 Conference on Human Information Interaction and Retrieval (CHIIR ’20), Mar 14, 2020. ACM, 5 pp. DOI 10.1145/3343413.3378004. arXiv:2001.09455 [cs.IR]. NSF PAR 10146883. Acceptance rate: 47%. Cited 15 times. Cited 11 times.

Mucun Tian and Michael D. Ekstrand. 2018. Monte Carlo Estimates of Evaluation Metric Error and Bias. Computer Science Faculty Publications and Presentations 148, Boise State University. Presented at the REVEAL 2018 Workshop on Offline Evaluation for Recommender Systems at RecSys 2018. DOI 10.18122/cs_facpubs/148/boisestate. NSF PAR 10074452. Cited 1 time. Cited 1 time.

Nicola Ferro, Norbert Fuhr, Gregory Grefenstette, Joseph A. Konstan, Pablo Castells, Elizabeth M. Daly, Thierry Declerck, Michael D. Ekstrand, Werner Geyer, Julio Gonzalo, Tsvi Kuflik, Krister Lindén, Bernardo Magnini, Jian-Yun Nie, Raffaele Perego, Bracha Shapira, Ian Soboroff, Nava Tintarev, Karin Verspoor, Martijn C. Willemsen, and Justin Zobel. 2018. The Dagstuhl Perspectives Workshop on Performance Modeling and Prediction. SIGIR Forum 52(1) (June 2018), 91–101. DOI 10.1145/3274784.3274789. Cited 19 times. Cited 20 times.

Nicola Ferro, Norbert Fuhr, Gregory Grefenstette, Joseph A. Konstan, Pablo Castells, Elizabeth M. Daly, Thierry Declerck, Michael D. Ekstrand, Werner Geyer, Julio Gonzalo, Tsvi Kuflik, Krister Lindén, Bernardo Magnini, Jian-Yun Nie, Raffaele Perego, Bracha Shapira, Ian Soboroff, Nava Tintarev, Karin Verspoor, Martijn C. Willemsen, and Justin Zobel. 2018. From Evaluating to Forecasting Performance: How to Turn Information Retrieval, Natural Language Processing and Recommender Systems into Predictive Sciences (Dagstuhl Perspectives Workshop 17442). Dagstuhl Manifestos 7(1) (November 2018), 96–139. DOI 10.4230/DagMan.7.1.96. Cited 20 times. Cited 19 times.

Michael D. Ekstrand and Vaibhav Mahant. 2017. Sturgeon and the Cool Kids: Problems with Random Decoys for Top-N Recommender Evaluation. In Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference (Recommender Systems track), May 29, 2017. AAAI, pp. 639–644. No acceptance rate reported. Cited 25 times. Cited 11 times.

Justin J. Levandoski, Michael D. Ekstrand, Michael J. Ludwig, Ahmad Eldawy, Mohamed F. Mokbel, and John T. Riedl. 2011. RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures. Proceedings of the VLDB Endowment 4(11) (August 2011), 911–920. Acceptance rate: 18%. Cited 23 times. Cited 9 times.