Comparing Recommendation Lists
In my research, I am trying to understand how different recommender algorithms behave in different situations. We’ve known for a while that ‘different recommenders are different’1, to paraphrase Sean McNee. However, we lack thorough data on how they are different in a variety of contexts. Our RecSys 2014 paper, User Perception of Differences in Recommender Algorithms (by myself, Max Harper, Martijn Willemsen, and Joseph Konstan), reports on an experiment that we ran to collect some of this data.
I have done some work on this subject in offline contexts already; my When Recommenders Fail paper looked at contexts in which different algorithms make different mistakes. LensKit makes it easy to test many different algorithms in the same experimental setup and context. This experiment brings my research goals back into the user experiment realm: directly measuring the ways in which users experience the output of different algorithms as being different. To run this experiment, we took advantage of a number of resources that, together, enable some really fun experimental capabilities: In addition to colleagues at Minnesota, we were privileged to collaborate with Martijn Willemsen at the Eindhoven University of Technology on this project. Protip: if you want to measure what people think, collaborate with a psychologist. It will work out much better than just trying to do it yourself. The quality of our user experiments and survey instruments has increased substantially since beginning to work with Martijn. The high-level goal of this experiment was to see how users perceive different recommender algorithms to produce different recommendations. To make this goal actionable, we developed the following research questions: We studied these in the context of selecting a recommender for movies. For a few months, users were invited to try out a beta of the new MovieLens. After they logged in, they got this screen: Each user received recommendation lists from 2 different algorithms, along with a survey with 25 questions asking for the following: The paper contains all the gory details, including some novel analysis techniques that want to write more about later, but here are the highlights: Come to my talk Wednesday at RecSys 2014 or read the paper to learn more! I believe I owe this phrasing to my colleague Morten Warncke-Wang.↩︎The Tools
The Questions
The Setup
The Results