Metrotwin Recommends
We’ve been using our new Acts As Recommendable plugin on metrotwin.com and it’s been interesting to see how it’s performing in a real-world situation.
Bookmarks (places) are integral to Metrotwin, and a user can associate themselves with a bookmark by ‘Loving it’, saving it to their profile, or by stating they’ve been there.
So there was potentially a lot of information that could be collected about users preferences from their association with bookmarks. And that information could then be used to improve the overall experience, such as recommending bookmarks to people, and showing similar bookmarks – a great example of a practical application to Collective Intelligence.
The screenshot shows Acts As Recommendable in action – displaying a list of tailored recommendations on Metrotwin. What you’re seeing is basically ‘people who are associated with some of your bookmarks (AKA similar users) are also associated with the following bookmarks’.
Metrotwin Recommends is tailored to the specific individual and where we don’t have an recommendation data for that user we show a generic list of the top 5 bookmarks.
We had to do a lot of tuning to Acts As Recommendable to make sure it would scale to the amount of data required. Most of which revolved around two things, the amount of memory consumed and the speed that the dataset was generated at. We found that ActiveRecord was too memory intensive to build the initial user/bookmark matrix (it would crash Ruby!) so we used raw SQL to build an array of integers. We then found Ruby too slow to perform the pearson calculation needed – so I rewrote that in C, calling it from Ruby, which sped up things up considerably.
We can’t generate the recommendations on the fly – so we generate a very large similarity dataset of all the bookmarks offline, once a day.
This similarity matrix greatly reduces the amount of calculations and SQL queries we have to make at run time, without which the whole process wouldn’t be viable. Each bookmark has a row in the dataset where it is compared to every other bookmark, and this row is stored in memcached (so your web servers can share the memory, without having to generate the dataset for every mongrel).
I’ve also been testing the plugin on a real set of users/movies, where the recommendations are perhaps more clear.
The list below are the similar movies to the film ‘Terminator (1984)’ as calculated by our algorithm.
- Terminator 2: Judgment Day (1991)
- Raiders of the Lost Ark (1981)
- Empire Strikes Back, The (1980)
- Alien (1979)
- Aliens (1986)
- True Lies (1994)
- Jurassic Park (1993)
- Indiana Jones and the Last Crusade (1989)
- Die Hard (1988)
- Star Trek: The Wrath of Khan (1982)
I think those results are pretty accurate. I’m also trying to get my hands on the Netflix Prize dataset, to see how the plugin responds to a much larger amount of data (and also to get some newer movies).
So this shows you don’t have to have the massive resources that a company like Amazon or Google have to deliver accurate and tailored recommendations to people – and this plugin provides a production tested solution that you can easily drop into an existing Rails application.
See also:
- No similar posts
About the author
-
Comments (5)
-
Responses (0)

Have done any RMSE testing with it yet?
Also, I’m not sure just how large the dataset you’re using is, but consider that if you’re intending to continue holding the whole thing in memory… It’s not going to scale.
100,000 items. Each one is compared to all of the others. So we have a matrix with 1,000,000,000,000 nodes. If every node is an 8-bit number that’s ~10 gigs of memory.
So, 100,000 is a lot, but doable… but what about a million bookmarks?
Not that it’s even reasonable to start optimizing for that yet, but I’m curious what your take on it is.
Tyler
October 9, 2008
at 7:28 pm
Tyler,
No, we haven’t done any RMSE testing yet. The testing we have done is to construct a small dataset manually, and then test our assumptions against that dataset.
You’re right that this would have trouble scaling to millions of bookmarks – but we’re only expecting to have about 1000 (the idea is to keep quality up) and the public can’t make bookmarks. However, I guess if we needed to scale to that size we’d have to start clustering – probably using K-means. There’s a good article on how Amazon scales up here:
http://agents.csie.ntu.edu.tw/~yjhsu/courses/u2010/papers/Amazon%20Recommendations.pdf
I also see you’ve made a recommendation plugin too – how much can it scale to?
alex
October 10, 2008
at 8:41 am
Not very much! It grinds through about 1.5 million similarity comparisons at the moment with each run at the moment. So, it’s not really being pushed all that hard.
I started looking through your ActsAsRecommendable last night. I really like the API you’ve exposed with it… I think the biggest flaw with mine is the difficulty of use.
But try using RMSE… You may find that with a few tweaks you can improve the results of your system tremendously.
Tyler
October 11, 2008
at 3:56 am
Make that 15 million. Math is hard.
Tyler
October 11, 2008
at 4:00 am
I know its not as complex as this plugin, but I created a plugin that does really basic similarity matching. Check it out
http://www.freezzo.com/2009/06/04/acts_as_similar-a-basic-similarity-activerecord-plugin/
Randy
June 5, 2009
at 3:03 pm