Mathematical Notation for Recommender Systems
Over the years of teaching and research, I have gradually standardized the notation that I use for describing the math of recommender systems. This is the notation that I use in my classes, Joe Konstan and I have adopted for our MOOC, and that I use in most of my research papers. (And thanks to Joe for helping revise it to its current form.)
If you haven’t already settled on a notation, perhaps you would consider adopting this one. I also welcome feedback on improving it.
I have tried to strike a balance between clarity and clutter. I slightly overload the meaning of some symbols; in particular, I am loose with distinctions between sets and matrices, because it is generally clear from context which is being invoked; I do not overload external referents, however. I also have tried to keep this notation so that it can be hand-written, making it more useful in teaching but meaning that I cannot rely as much on typography to distinguish different objects (e.g. separating \(U\) and \(\mathcal{U}\) would be questionable). Our input data, for collaborative filtering, consists of: Within each of these sets, we can refer to individual entries: One advantage of always using \(u,v\) for users and \(i,j\) for items is that meaning is clear from looking at a variable or subscripted variable. It also allows the following subset notations: With this notation, we can write things like the user-user rating prediction formula: \[\hat r_{ui} = s(i;u) = \frac{\sum_{v \in N(u,i)} w(u,v)(r_{vi}-\bar r_v)}{\sum_{v \in N(u,i)} |w(u,v)|} + \bar r_u\] \(N(u;i)\) is the neighborhood for user \(u\) for the purpose of scoring item \(i\), and \(\bar r_u\) is user \(u\)’s average rating. I explicitly state the ranges of my summations for two reasons, even if they are implicitly clear for experienced recsys researchers: it is a common tripping point for students, and it is a place where subtle implementation differences get buried. I find that it is not overly cumbersome with this notation. For basic decomposition of the ratings matrix, I usually use \(P\) and \(Q\) these days: \[R \approx PQ^{\mathrm{T}}\] We then have user vectors \(\vec p_u\) and item vectors \(\vec q_i\). I like using the transpose notation for the factorization, so that latent features are on the columns of both \(P\) and \(Q\). Advanced decompositions that involve more than two matrices will need to find additional matrices; one unfortunate tradeoff of using \(U\) for the set of users is that it is no longer available for a matrix.Input Data
Scores and Similarities
Matrix Factorization