# Similarity Functions in Item-Item CF

The core of an item-item collaborative filter is the *item similarity function*: a function of two items \(s(i,j): \mathcal{i}\times\mathcal{i} \to [-1,1]\) that measures how similar those items are. Common choices are vector similarity functions over the vectors of users’ ratings of each item, such as the cosine similarity or Pearson correlation.

Early on, Sarwar et al. tested a few choices:

The

*Pearson correlation*from statistics:\[ \frac{\sum{(r_{ui} - \mu_i) (r_{uj} - \mu_j)}}{\sqrt{\sum{(r_{ui} - \mu_i)^2}} \sqrt{\sum{(r_{uj} - \mu_j)^2}}} \]

The

*cosine similarity*between raw vectors:\[ \frac{\vec{r_i}\cdot\vec{r_j}}{\|\vec{r_i}\|_2\|\vec{r_j}\|_2} = \frac{\sum{r_{ui} r_{uj}}}{\sqrt{\sum{r_{ui}^2}} \sqrt{\sum{r_{uj}^2}}} \]

The

*adjusted cosine similarity*between vectors normalized by subtracting the user’s mean rating:\[ \frac{\vec{\hat{r}_i}\cdot\vec{\hat{r}_j}}{\|\vec{\hat{r}_i}\|_2\|\vec{\hat{r}_j}\|_2} = \frac{\sum{(r_{ui} - \mu_u) (r_{uj} - \mu_u)}}{\sqrt{\sum{(r_{ui} - \mu_u)^2}} \sqrt{\sum{(r_{uj} - \mu_u)^2}}} \]

They found adjusted cosine to work better, and so far as I know, it has been the dominant similarity function for rating-based item-item CF systems.