Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Recommending items to more than a billion people (facebook.com)
116 points by antoinec on June 3, 2015 | hide | past | favorite | 21 comments


The technical name for this is "collaborative filtering". I think they are basing their work on this paper

- http://www.jmlr.org/papers/volume10/takacs09a/takacs09a.pdf

EDIT: Actually looks like Eq (15) from

- http://public.research.att.com/~volinsky/netflix/BellKorICDM...

Anyway there are lots of papers around on the topic.


They do mention that in the 2nd paragraph and repeatedly throughout the article as CF. I found the article to be a nice primer on the topic as it listed common approaches to the problem.


Lots of research here, but the most interesting part is their distributed graph approach. Scaling up CF is a challenging problem, and the current open source implementations have problems at really large scales.

I hope they open source their Giraph implementation, or at least part of it.


This is pretty cool, the scale is one reason almost any time Facebook publishes something in "big data" subject it is worth to read.


I guess it is the only reason, because most of these problems are relatively easy to solve for small data.


You would be surprised. Several companies have a big data-problem even with small amount of it. :)


FWIW, I gave a talk about the Alternating Least Squares algorithm mentioned here (and linked in several comments) and how we implemented it at Spotify:

Slides: http://www.a1k0n.net/spotify/ml-madison/ Video (for the extremely patient): https://www.youtube.com/watch?v=MX_ARH-KoDg


To solve the matrix equation A × X = B we need to find the inverse A^-1

Huh? Isn't Gaussian elimination more straightforward?


Since they're using collaborative filtering, the matrix they're solving for is very sparse (ie it's an undetermined system). So they're fitting a nonlinear regression model by minimizing the regularized squared error. Since the vectors they're trying to model (x and y) are both unknown, the optimization problem is not convex, or in other words, can't be solved for exactly.

http://www2.research.att.com/~volinsky/papers/ieeecomputer.p...

Edit: everything, then added AT&T research paper link


That's not actually true. The matrix they're talking about inverting here is actually just a square matrix of 8x8 or 128x128 (in their graphed example) to solve the alternating least squares problem, in the paper you linked.

What you do is for each user, sum up the vectors of items they 'used', and the outer product of each item vector with itself, and solve a straightforward linear algebra equation. Then you flip it around and sum up the new user vectors and their outer products for each item. And alternate.

And so yeah, you don't need to invert the matrix, you use a Cholesky decomposition or something similar as it's guaranteed to be positive definite.


Come again? A nonlinear matrix equation of the form AX=B?


Yeah, I have no idea why I said that. Answer's been fixed


Gaussian elimination finds the inverse of a matrix. However, it is not efficient for large matrices, so they use a different algorithm.


Maybe there are some optimized ways to calculate inv(A)? I could see Gaussian elimination taking a very long time if your A and B matrix are HUGE.


Enhanced versions of Gaussian elimination are used for quite large matrices (with many millions of non-zero coefficients), but for even larger matrices, approximate algorithms can be used. These converge to the desired solution, as opposed to computing it directly, as with Gaussian elimination.


Depending on your matrix there might be faster methods http://au.mathworks.com/help/releases/R2015a/matlab/ref/mldi...


I hoped for a minute that they shared their complete implementation; anyone aware of a recommendation system that can scale to millions of items, be updated as soon as new items come in (no full graph recalculation) and take multiple inputs (ratings, saved in library, etc)?


Have a look at Google's MinHash algorithm. While it's a probabilistic solution, You can run it as a mapreduce and will at no point need to have the full data set in memory/on a single machine. So it does scale pretty well.

http://www2007.org/papers/paper570.pdf

EDIT: I see you changed your comment to include "no full graph recalcuation". Incremental recos are possible to do with minhash but I think you can't solve decay of old data easily.


Please don't do it, however great technical feat it is, the truth is, it sucks. I hate those the most in facebook.


You would rather ads that are not tailored to your interests?


Well of course. Tailored ads just make sense to companies because it increases click/conversion rates.

I would rather not be convinced to buy things I don't have a need for. Thus, I don't see how more effective ads are any better for me.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: