Putting the "folk" back in folksonomy
Or ... The fat belly and recommendation systems
Since the beginning of Web 2.0 time, "folksonomy" has been synonymous with tagging. It's time to fill out the picture. As readers of this blog know, folksonomy involves tags, tagged-items, and tagger-users. This post digs deeper re: the role of users in the "holy trinity" of user-tag-item. And examines the relationship of users to recommendation systems, ... and to the "fat belly".
Yes, that does sound like a whole lot of ground to cover but
a) I have been gone for a while so need to catch up in a hurry - what can I say?
b) It's not that much ground to cover when we see the interesting relationships
c) The notation described in the previous post makes it possible to cover a lot of ground without too much verbiage.
So without further ado, here goes.
In a typical folksonomy system we have users attaching tags to items. As the system evolves we have, given an item 'i', the sets: -
T(i), the tags associated with i and
U(i) the users who use the item i.
Typical folksonomy apps have focused on navigating the various relationships with a focus on T(i). Recommendation systems that suggest 'related items' are also most often based on T(i), as follows. Given an item we find all tag related items via I(T(i)). Then we use some algorithm to trim this down to the "best" 5 or 10 by some definition of "best". Then we use these as recommendations. Given a user of item i, these are the recommended other items, or 'related items' based on tags.
For the rest of this discussion, we denote this set of recommendations as Rt(i) i.e given an item i, the recommended other items based on tags.
Consider now, the other way to get related items, i.e. user-related items.
This is the famous "users who bought this item also bought ...." approach that we know and love.
Given an item i we get U(i) all the users of i, and then I(U(i)), all the items used by those users. Again we use some way to trim this down to the best 5 to 10 or so and recommend these. Given a user of item i, these are the recommended other items, based on users.
We denote this set of recommendations as Ru(i) i.e the recommendations based on users of item i.
Now comes the interesting part derived from work done at Odeo and Greenplum over the last year or so. Experiments suggest the following two major results, which need much more qualification by further work and study. This is only an indicator of interesting areas for research, not a formal proof of anything.
a) Empirical results suggest that for even a small set of users Ru(i) gives better recommendations than Rt(i), i.e. using user-related items gives better recommendations than using tag-related items.
b) Empirical results suggest that the "algorithm" we use to go from I(U(i)) to Ru(i) makes a lot of difference to the relevance and 'interestingness' of recommendations.
Ok, b) was really cryptic so we'll take the rest of this post to unpack it into useful results and pretty pictures.
Step by step,
I(U(i) is the raw set of user related items for item i (people who bought item i also bought a whole ton of other shtuff namely I(U(i)) )
But that is too huge a set to use as recommendations - it could have anywhere from tens to tens of thousands of items depending on what data we are operating on. So we need to trim this down with a filter that filters out and keeps the best recommendations.
So I(U(i)) ---> Filter ---> Ru(i) ie. after filtering the raw set of user-related items we get user-related recommendations.
Now we need to decide how to filter. Lets do the simple thing first.
First we sort the collection I(U(i)) by count, i.e. how many times does some item turn up in this collection.
The temptation is to take the top 10 items by count and use these as the recommendation. This is what I did in practice and found that the recommendations that are generated are only mildly customized i.e they are interesting in general but not necessarily interesting to me. Most of the times they are almost identical to the "most popular" items on the front page.
Why is this?
Because I *took* the most popular ones by count, I sampled the head of the distribution and didn't get anything new.
So then I decided to go the other way - I looked at the lower end of the counts and picked reco's from there, i.e. the proverbial "long tail". Now I got some strange and freaky recommendations - if you had subscribed to the Catholic podcast on Odeo you would have been recommended the Open Source Sex podcast. Not quite what we have in mind, when we say "recommendations".
This led me by accident to explore the remaining area of the range of counts, the middle, recently named the "fat belly" by Robert Young in a recent post on GigaOm.
Here is where things got very, very interesting in the recommendations generated. For example,
Evan Williams who has an interest in modern furniture got a recommendation for a podcast related to furniture although none of his current subscriptions had anything to do with furniture!
This was very exciting and stimulated further exploration which confirmed that the best recommendations came from the fat belly.
I(U(i)) ----> Sort by count, filter from the head ----> "popular (i.e. obvious) "
I(U(i)) ----> Sort by count, filter from the long tail ----> "freaky (i.e. too different)"
I(U(i)) ----> Sort by count, filter from the fat belly ----> "relevant and interesting"
Now the other interesting observation was that using similar techniques on I(T(i)) did not give such crisp recommendations, where I(T(i)) are all the tag-related items for a given item. i.e. collections of tags are not as useful as collections of users in creating a recommendation engine.
Why might this be and how do we understand it from first principles? Here's my little theory.
Let's think about this in terms of gestures, primary and secondary gestures. Users express interest in an item by various gestures. One of them is tagging an item, but prior to tagging an item is the act of focusing on an item and picking it out of the vast universe of items.
This primary selection process appears to be far more powerful an indication of interest than the secondary act of tagging or describing the already selected item. Hence, I hypothesize, a recommendation system based on user-related items is more crisp than one basedon tag-related items.
The bigger picture here suggests that the user or people dimension in folksonomy is just as or more interesting than just the tag dimension. We need to look more deeply at the "folk" and not just the "..sonomy".
(This subject was discussed in a talk I gave at FooCamp where present were and some very smart people like Hal Varian of Google, DeWitt Clinton ex of Amazon, Luke Lonergan CTO of Greenplum, Mary Hodder of Dabble, James Levine of SimplyHired, and Todd "the SEO Guy" .... who participated in a very energetic discussion and helped me refine these ideas. Thanks for that, guys.)