Sunday, October 01, 2006

Putting the "folk" back in folksonomy

Or ... The fat belly and recommendation systems



Since the beginning of Web 2.0 time, "folksonomy" has been synonymous with tagging. It's time to fill out the picture. As readers of this blog know, folksonomy involves tags, tagged-items, and tagger-users. This post digs deeper re: the role of users in the "holy trinity" of user-tag-item. And examines the relationship of users to recommendation systems, ... and to the "fat belly".

Yes, that does sound like a whole lot of ground to cover but

a) I have been gone for a while so need to catch up in a hurry - what can I say?
b) It's not that much ground to cover when we see the interesting relationships
c) The notation described in the previous post makes it possible to cover a lot of ground without too much verbiage.

So without further ado, here goes.

In a typical folksonomy system we have users attaching tags to items. As the system evolves we have, given an item 'i', the sets: -
T(i), the tags associated with i and
U(i) the users who use the item i.

Typical folksonomy apps have focused on navigating the various relationships with a focus on T(i). Recommendation systems that suggest 'related items' are also most often based on T(i), as follows. Given an item we find all tag related items via I(T(i)). Then we use some algorithm to trim this down to the "best" 5 or 10 by some definition of "best". Then we use these as recommendations. Given a user of item i, these are the recommended other items, or 'related items' based on tags.

For the rest of this discussion, we denote this set of recommendations as Rt(i) i.e given an item i, the recommended other items based on tags.

Consider now, the other way to get related items, i.e. user-related items.
This is the famous "users who bought this item also bought ...." approach that we know and love.

Given an item i we get U(i) all the users of i, and then I(U(i)), all the items used by those users. Again we use some way to trim this down to the best 5 to 10 or so and recommend these. Given a user of item i, these are the recommended other items, based on users.
We denote this set of recommendations as Ru(i) i.e the recommendations based on users of item i.

Now comes the interesting part derived from work done at Odeo and Greenplum over the last year or so. Experiments suggest the following two major results, which need much more qualification by further work and study. This is only an indicator of interesting areas for research, not a formal proof of anything.

a) Empirical results suggest that for even a small set of users Ru(i) gives better recommendations than Rt(i), i.e. using user-related items gives better recommendations than using tag-related items.

b) Empirical results suggest that the "algorithm" we use to go from I(U(i)) to Ru(i) makes a lot of difference to the relevance and 'interestingness' of recommendations.

Ok, b) was really cryptic so we'll take the rest of this post to unpack it into useful results and pretty pictures.

Step by step,

I(U(i) is the raw set of user related items for item i (people who bought item i also bought a whole ton of other shtuff namely I(U(i)) )

But that is too huge a set to use as recommendations - it could have anywhere from tens to tens of thousands of items depending on what data we are operating on. So we need to trim this down with a filter that filters out and keeps the best recommendations.

So I(U(i)) ---> Filter ---> Ru(i) ie. after filtering the raw set of user-related items we get user-related recommendations.

Now we need to decide how to filter. Lets do the simple thing first.

First we sort the collection I(U(i)) by count, i.e. how many times does some item turn up in this collection.

The temptation is to take the top 10 items by count and use these as the recommendation. This is what I did in practice and found that the recommendations that are generated are only mildly customized i.e they are interesting in general but not necessarily interesting to me. Most of the times they are almost identical to the "most popular" items on the front page.

Why is this?

Because I *took* the most popular ones by count, I sampled the head of the distribution and didn't get anything new.

So then I decided to go the other way - I looked at the lower end of the counts and picked reco's from there, i.e. the proverbial "long tail". Now I got some strange and freaky recommendations - if you had subscribed to the Catholic podcast on Odeo you would have been recommended the Open Source Sex podcast. Not quite what we have in mind, when we say "recommendations".

This led me by accident to explore the remaining area of the range of counts, the middle, recently named the "fat belly" by Robert Young in a recent post on GigaOm.

Here is where things got very, very interesting in the recommendations generated. For example,
Evan Williams who has an interest in modern furniture got a recommendation for a podcast related to furniture although none of his current subscriptions had anything to do with furniture!

This was very exciting and stimulated further exploration which confirmed that the best recommendations came from the fat belly.


So

I(U(i)) ----> Sort by count, filter from the head ----> "popular (i.e. obvious) "

I(U(i)) ----> Sort by count, filter from the long tail ----> "freaky (i.e. too different)"

I(U(i)) ----> Sort by count, filter from the fat belly ----> "relevant and interesting"


Recommendation systems and the powerlaw curve


Now the other interesting observation was that using similar techniques on I(T(i)) did not give such crisp recommendations, where I(T(i)) are all the tag-related items for a given item. i.e. collections of tags are not as useful as collections of users in creating a recommendation engine.

Why might this be and how do we understand it from first principles? Here's my little theory.

Let's think about this in terms of gestures, primary and secondary gestures. Users express interest in an item by various gestures. One of them is tagging an item, but prior to tagging an item is the act of focusing on an item and picking it out of the vast universe of items.
This primary selection process appears to be far more powerful an indication of interest than the secondary act of tagging or describing the already selected item. Hence, I hypothesize, a recommendation system based on user-related items is more crisp than one basedon tag-related items.

The bigger picture here suggests that the user or people dimension in folksonomy is just as or more interesting than just the tag dimension. We need to look more deeply at the "folk" and not just the "..sonomy".

(This subject was discussed in a talk I gave at FooCamp where present were and some very smart people like Hal Varian of Google, DeWitt Clinton ex of Amazon, Luke Lonergan CTO of Greenplum, Mary Hodder of Dabble, James Levine of SimplyHired, and Todd "the SEO Guy" .... who participated in a very energetic discussion and helped me refine these ideas. Thanks for that, guys.)

10 Comments:

Anonymous Chris Renner said...

Excellent post, and I'm glad to see you back. The archives have been very useful to me in the past few months, and I look forward to more great stuff. (no pressure)

3:05 AM  
Anonymous Michael Hamann said...

I like your posts!

But one question: With U(i) you mean users that have voted (digged) that item? Or what do you mean when you say "users who use the item i"?

3:39 AM  
Anonymous Kent Bye said...

QUESTION: Given a power law distribution, how do you determine the thresholds between the Long Tail Head & Fat Belly and the Fat Belly and Long Tail?

I wrote up a post describing a Tag Cloud Font Distribution Algorithm" a while back and was able to quantize the count distribution into X number of different font sizes.

You must have a similar way of automatically determining the Head range, Fat Belly range and Long Tail range. Can you share any insights on this?

8:46 AM  
Blogger Nitin said...

Hi Michael, Kent,

The meaning of U(i) depends on the web site - on delicious it is a user who has bookmarked the item, on flickr it is the person who uploaded the item, on digg it is a user who dugg that item, yes.

As for the boundaries between the three regions, this is, in the current state o f the research, somewhat heuristic and arbitrary. Nevertheless, it's clear that it depends on the distribution and one will need to come out with some clear definitions with some formulas. We are not there yet.

But I am curious how one would start.
Perhaps find the 33rd and 66th percentiles and call everything in between, the "fat belly"? Worth a shot.

More importantly, one has to ask whether one needs such a formula since we are in a "fuzzy" regime anyway, logically that is. Do crisp well defined boundaries give better recommendations? How do we differentiate between two marginally different recommendations? Does it matter ? Is it worth it to spend a lot of time if we can't quantify the improvement over the heuristic guess?

Frankly, I am not sure it's that important. The huge insight here is that if we stay away from the head and the tail, we get really useful correlations == recommendations. My gut feeling is that spending too much time on making the boundaries crisp will not yield proportionately better recommendations.

But only more experiments will tell.

11:41 AM  
Anonymous Kent Bye said...

Thanks for the reply on the boundaries, and I figured that it was a fuzzy issue.

One thing that I realized with the tag cloud distribution is that was immediately easy to visually determine the thresholds, but a harder issue to generalize and automate within an algorithm.

It looks like a similar problem that seems like the first step would be to see how much variance there is within different types of distributions. Another thought is to somehow consider the volume underneath the Head versus Fat Bell versus Long Tail.

Finally, I thought that this post was relevant to the annoucement of a Million dollar prize for Netflix to improve their recommendation system:

The Netflix Prize seeks to substantially improve the accuracy of predictions about how much someone is going to love a movie based on their movie preferences. Improve it enough and you win one (or more) Prizes. Winning the Netflix Prize improves our ability to connect people to the movies they love.

Thinking of applying? :)

2:12 PM  
Anonymous Anonymous said...

Hey, interesting article.
I believe there is a way to boost both the performance and 'usefulness' of the recommendations, by using Bayesian logic, because you can datawarehouse a lot of the probabilities and speed up your queries that way.
Have you looked into that ?

12:28 PM  
Anonymous Simon Reinhardt said...

You can find this curve in use in a quite popular social software: last.fm. Apart from supporting tags they also have a very strong recommendation system based on similar artists, user related. They allow their users to slide a horizontal line on this curve to actually select the point on the x-axis themselves where to cut off.

5:47 AM  
Anonymous bitbutter said...

Thanks for the interesting articles. Apart from the "users who bought this item also bought ...." pages on amazon there is also the "We have recommendations for you" page.

I'd be interested to hear your thoughts on approaches to finding recommended items taking a User as a starting point rather than a specific Item.

10:59 AM  
Anonymous Robert Young said...

Nitin,

This is an excellent follow up to my Fat Belly thesis... and very glad to see you running with the ball. Keep up the great work!

8:13 AM  
Blogger Phillip said...

Hey Nitin.

I must apologize that I just now read your post. Excellent! I guess your post just hit me at the wrong time but now I got back to your blog and read this post.

Have you any actual results of your thesis, because it sounds really interesting. Even if you don't have exact numbers, any further information would be very appreciated!

12:14 PM  

Post a Comment

<< Home