Sturgeon and the Cool Kids: Problems with Top-N Recommender Evaluation
2017. Sturgeon and the Cool Kids: Problems with Top-N Recommender Evaluation. To appear in Proceedings of the 30th International Florida Artificial Intelligence Research Society Conference.and .
Top-N evaluation of recommender systems, typically carried out using metrics from information retrieval or machine learning, has several challenges. Two of these challenges are popularity bias, where the evaluation intrinsically favors algorithms that recommend popular items, and misclassified decoys, where items for which no user relevance is known are actually relevant to the user, but the evaluation is unaware and penalizes the recommender for suggesting them. One strategy for mitigating the misclassified decoy problem is the one-plus-random evaluation strategy and its generalization, which we call random decoys. In this work, we explore the random decoy strategy through both a theoretical treatment and an empirical study, but find little evidence to guide its tuning and show that it has complex and deleterious interactions with popularity bias.