Recommending items to more than a billion people

sjtrny · on June 3, 2015

The technical name for this is "collaborative filtering". I think they are basing their work on this paper

- http://www.jmlr.org/papers/volume10/takacs09a/takacs09a.pdf

EDIT: Actually looks like Eq (15) from

- http://public.research.att.com/~volinsky/netflix/BellKorICDM...

Anyway there are lots of papers around on the topic.

Rifu · on June 3, 2015

They do mention that in the 2nd paragraph and repeatedly throughout the article as CF. I found the article to be a nice primer on the topic as it listed common approaches to the problem.

thebigjc · on June 3, 2015

Lots of research here, but the most interesting part is their distributed graph approach. Scaling up CF is a challenging problem, and the current open source implementations have problems at really large scales.

I hope they open source their Giraph implementation, or at least part of it.

istvan__ · on June 3, 2015

This is pretty cool, the scale is one reason almost any time Facebook publishes something in "big data" subject it is worth to read.

amelius · on June 3, 2015

I guess it is the only reason, because most of these problems are relatively easy to solve for small data.

istvan__ · on June 3, 2015

You would be surprised. Several companies have a big data-problem even with small amount of it. :)

a1k0n · on June 3, 2015

FWIW, I gave a talk about the Alternating Least Squares algorithm mentioned here (and linked in several comments) and how we implemented it at Spotify:

Slides: http://www.a1k0n.net/spotify/ml-madison/ Video (for the extremely patient): https://www.youtube.com/watch?v=MX_ARH-KoDg

acconsta · on June 3, 2015

To solve the matrix equation A × X = B we need to find the inverse A^-1

Huh? Isn't Gaussian elimination more straightforward?

bigpeach · on June 3, 2015

Since they're using collaborative filtering, the matrix they're solving for is very sparse (ie it's an undetermined system). So they're fitting a nonlinear regression model by minimizing the regularized squared error. Since the vectors they're trying to model (x and y) are both unknown, the optimization problem is not convex, or in other words, can't be solved for exactly.

http://www2.research.att.com/~volinsky/papers/ieeecomputer.p...

Edit: everything, then added AT&T research paper link

a1k0n · on June 3, 2015

That's not actually true. The matrix they're talking about inverting here is actually just a square matrix of 8x8 or 128x128 (in their graphed example) to solve the alternating least squares problem, in the paper you linked.

What you do is for each user, sum up the vectors of items they 'used', and the outer product of each item vector with itself, and solve a straightforward linear algebra equation. Then you flip it around and sum up the new user vectors and their outer products for each item. And alternate.

And so yeah, you don't need to invert the matrix, you use a Cholesky decomposition or something similar as it's guaranteed to be positive definite.

yaakov34 · on June 3, 2015

Come again? A nonlinear matrix equation of the form AX=B?

bigpeach · on June 3, 2015

Yeah, I have no idea why I said that. Answer's been fixed

yaakov34 · on June 3, 2015

Gaussian elimination finds the inverse of a matrix. However, it is not efficient for large matrices, so they use a different algorithm.

Rainymood · on June 3, 2015

Maybe there are some optimized ways to calculate inv(A)? I could see Gaussian elimination taking a very long time if your A and B matrix are HUGE.

yaakov34 · on June 3, 2015

Enhanced versions of Gaussian elimination are used for quite large matrices (with many millions of non-zero coefficients), but for even larger matrices, approximate algorithms can be used. These converge to the desired solution, as opposed to computing it directly, as with Gaussian elimination.

sjtrny · on June 3, 2015

Depending on your matrix there might be faster methods http://au.mathworks.com/help/releases/R2015a/matlab/ref/mldi...

FiReaNG3L · on June 3, 2015

I hoped for a minute that they shared their complete implementation; anyone aware of a recommendation system that can scale to millions of items, be updated as soon as new items come in (no full graph recalculation) and take multiple inputs (ratings, saved in library, etc)?

paulasmuth · on June 3, 2015

Have a look at Google's MinHash algorithm. While it's a probabilistic solution, You can run it as a mapreduce and will at no point need to have the full data set in memory/on a single machine. So it does scale pretty well.

http://www2007.org/papers/paper570.pdf

EDIT: I see you changed your comment to include "no full graph recalcuation". Incremental recos are possible to do with minhash but I think you can't solve decay of old data easily.

skbohra123 · on June 3, 2015

Please don't do it, however great technical feat it is, the truth is, it sucks. I hate those the most in facebook.

sjtrny · on June 3, 2015

You would rather ads that are not tailored to your interests?

mildbow · on June 3, 2015

Well of course. Tailored ads just make sense to companies because it increases click/conversion rates.

I would rather not be convinced to buy things I don't have a need for. Thus, I don't see how more effective ads are any better for me.