Blog Articles 76–80

Similarity Functions in Item-Item CF

The core of an item-item collaborative filter is the item similarity function: a function of two items s(i,j):𝒾×𝒾[1,1]s(i,j): \mathcal{i}\times\mathcal{i} \to [-1,1] that measures how similar those items are. Common choices are vector similarity functions over the vectors of users’ ratings of each item, such as the cosine similarity or Pearson correlation.

Early on, Sarwar et al. tested a few choices:

  • The Pearson correlation from statistics:

    (ruiμi)(rujμj)(ruiμi)2(rujμj)2 \frac{\sum{(r_{ui} - \mu_i) (r_{uj} - \mu_j)}}{\sqrt{\sum{(r_{ui} - \mu_i)^2}} \sqrt{\sum{(r_{uj} - \mu_j)^2}}}

  • The cosine similarity between raw vectors:

    rirjri2rj2=ruirujrui2ruj2 \frac{\vec{r_i}\cdot\vec{r_j}}{\|\vec{r_i}\|_2\|\vec{r_j}\|_2} = \frac{\sum{r_{ui} r_{uj}}}{\sqrt{\sum{r_{ui}^2}} \sqrt{\sum{r_{uj}^2}}}

  • The adjusted cosine similarity between vectors normalized by subtracting the user’s mean rating:

    r̂ir̂jr̂i2r̂j2=(ruiμu)(rujμu)(ruiμu)2(rujμu)2 \frac{\vec{\hat{r}_i}\cdot\vec{\hat{r}_j}}{\|\vec{\hat{r}_i}\|_2\|\vec{\hat{r}_j}\|_2} = \frac{\sum{(r_{ui} - \mu_u) (r_{uj} - \mu_u)}}{\sqrt{\sum{(r_{ui} - \mu_u)^2}} \sqrt{\sum{(r_{uj} - \mu_u)^2}}}

They found adjusted cosine to work better, and so far as I know, it has been the dominant similarity function for rating-based item-item CF systems.

Tips for Personal Computer Security

For the good of yourself and your friends, family, and neighbors, it’s important to keep your computer (and phone) as secure as you practically can. But how do you do this? There is a lot of security advice floating around; a lot of it is confusing, and some of it is inaccurate. I get things wrong, too!

I’m pretty excited for Decent Security; it’s very much a work in progress, but as Taylor Swift continues to fill it out, I expect that it will be a very good resource.

But until then, and perhaps as something of a Cliff’s Notes, here are some of my top suggestions. Basic things that I’d suggest to any friends and family.

These also aren’t just limited to your desktop or laptop PC; some of them pertain to online accounts and mobile devices. This guide is also more of a ‘what to do’ than ‘how to do’; it assumes you are comfortable with clicking through settings pages, but don’t know what settings to check.

Sautée Satay Seitan (or something)

Tried a culinary experiment tonight!

Here’s the recipe (serves 2-3):

  • 1 medium onion, chopped
  • ½ medium jalapeño, diced
  • 2 tsp Satay spices, ground
  • Dash of paprika
  • 1 tsp salt (I didn’t measure, this is a guess)
  • 8oz seitan strips
  • Frozen broccoli (1–2C)
  • ½C water or stock
  • 4 large-ish baby bella mushrooms, sliced
  • Oil (I used vegatable)

Why Microsoft?

This is a joint post by Michael and Jennifer.

We each started using Linux more than a decade ago, and for our entire married life, we have been a primarily Linux-based household.

This spring, we decided to finally get smartphones. In the course of making this decision ­ and selecting our phones ­ we reevaluated many aspects of our technology use. This has resulted in a number of changes that many may find surprising:

  • We carry Nokia phones running Windows Phone 8.1.
  • E-mail service for elehack.net is now hosted by Microsoft, via their hosted Exchange service as a part of Office 365 business subscriptions.
  • We are running mainly Windows on our personal laptops.
  • We use Outlook for our e-mail, contacts, and calendars.
  • We use OneDrive for Business and SharePoint to ferry data between our devices and coordinate shared data for our household.

Old Papers on Recommender Systems

There’s a lot of research on recommender systems. There’s a lot of other research that, while not directly mentioning recommenders, is very relevant, including research from decades ago.

A few of my favorite old papers that I think recommender systems researchers would do well to read (and perhaps cite):

  • Back to Bentham? Explorations of experienced utility (Kahneman et al., 1997) — how people experience and remember pain and pleasure. Strong implications for what ratings mean and what kind of utility our recommenders should optimize for.

  • User Modeling via Stereotypes (Rich, 1979) — the first computer-based recommender system that I know about.

  • A searching procedure for information retrieval (Goffman, 1964) — this early IR paper has the crucial insight that the relevance of an item in a search search result list (or recommendation list) is not independent of the items that appear before or after it. Rather, an item may be less relevant if it is (partially) redundant with a previous item.