Monte Carlo Estimates of Evaluation Metric Error and Bias

Mucun Tian and Michael D. Ekstrand. 2018. Monte Carlo Estimates of Evaluation Metric Error and Bias. Computer Science Faculty Publications and Presentations 148. Boise State University. Presented at the REVEAL 2018 Workshop on Offline Evaluation for Recommender Systems, a workshop at RecSys 2018. DOI10.18122/cs_facpubs/148/boisestate. NSF PAR 10074452.


Traditional offline evaluations of recommender systems apply metrics from machine learning and information retrieval in settings where their underlying assumptions no longer hold. This results in significant error and bias in measures of top-NN recommendation performance, such as precision, recall, and nDCG. Several of the specific causes of these errors, including popularity bias and misclassified decoy items, are well-explored in the existing literature. In this paper we survey a range of work on identifying and addressing these problems, and report on our work in progress to simulate the recommender data generation and evaluation processes to quantify the extent of evaluation metric errors and assess their sensitivity to various assumptions.

Links and Resources

This material is based upon work supported by the National Science Foundation under Grant No. IIS 17-51278. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.