Multiple Testing for IR and Recommendation System Experiments

Ngozi Ihemelandu; Michael D. Ekstrand

doi:10.1007/978-3-031-56063-7_37

Multiple Testing for IR and Recommendation System Experiments

Ngozi Ihemelandu and Michael D. Ekstrand. 2024. Multiple Testing for IR and Recommendation System Experiments. Short paper in Proceedings of the 46th European Conference on Information Retrieval (ECIR '24), Mar 24–28, 2024. Lecture Notes in Computer Science 14610:449–457. DOI 10.1007/978-3-031-56063-7_37. NSF PAR 10497108. Acceptance rate: 24.3%. Cited 3 times.

Download PDF (AAM) ～ Official Version

Several runners in a race. — Photo by Jonathan Chng on Unsplash

This paper was led by my Ph.D student Ngozi Ihemelandu.

Abstract

While there has been significant research on statistical techniques for comparing two information retrieval (IR) systems, many IR experiments test more than two systems. This can lead to inflated false discoveries due to the multiple-comparison problem (MCP). A few IR studies have investigated multiple comparison procedures; these studies mostly use TREC data and control the familywise error rate. In this study, we extend their investigation to include recommendation system evaluation data as well as multiple comparison procedures that controls for False Discovery Rate (FDR).

Michael Ekstrand

Multiple Testing for IR and Recommendation System Experiments

Abstract

Listed Under

Recorded Elsewhere