Similarity Functions in Item-Item CF
The core of an item-item collaborative filter is the item similarity function: a function of two items \(s(i,j): \mathcal{i}\times\mathcal{i} \to [-1,1]\) that measures how similar those items are. Common choices are vector similarity functions over the vectors of users’ ratings of each item, such as the cosine similarity or Pearson correlation.
Early on, Sarwar et al. tested a few choices:
The Pearson correlation from statistics:
\[ \frac{\sum{(r_{ui} - \mu_i) (r_{uj} - \mu_j)}}{\sqrt{\sum{(r_{ui} - \mu_i)^2}} \sqrt{\sum{(r_{uj} - \mu_j)^2}}} \]
The cosine similarity between raw vectors:
\[ \frac{\vec{r_i}\cdot\vec{r_j}}{\|\vec{r_i}\|_2\|\vec{r_j}\|_2} = \frac{\sum{r_{ui} r_{uj}}}{\sqrt{\sum{r_{ui}^2}} \sqrt{\sum{r_{uj}^2}}} \]
The adjusted cosine similarity between vectors normalized by subtracting the user’s mean rating:
\[ \frac{\vec{\hat{r}_i}\cdot\vec{\hat{r}_j}}{\|\vec{\hat{r}_i}\|_2\|\vec{\hat{r}_j}\|_2} = \frac{\sum{(r_{ui} - \mu_u) (r_{uj} - \mu_u)}}{\sqrt{\sum{(r_{ui} - \mu_u)^2}} \sqrt{\sum{(r_{uj} - \mu_u)^2}}} \]
They found adjusted cosine to work better, and so far as I know, it has been the dominant similarity function for rating-based item-item CF systems.