Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

  MUBI's curatorial approach stands in sharp contrast to major streaming services
  like Netflix, Hulu, and Amazon, which have amassed large libraries and deliver
  personalized recommendations based on algorithms. MUBI's human-curated selection
  takes a lot of that choice away, and Cakarel thinks viewers are better off
  for it.

  "Think about your own Netflix experience and how frustrating it is — how
  long it takes you to find a film that you want to watch," says Cakarel.
  "It doesn't work. It categorically doesn't work."
While I get what they're saying, I still wonder if it wouldn't be interesting to try out some personal recommendation algorithms/systems on their user ratings dataset. (After reviewing their T&C, I've actually crawled through their data (more like, their backend exposes neat API endpoints that are not part of any formal API (officially they don't have one)..) some time ago, and am still curious to just try out some stuff from scikit-learn.

Maybe someone has thought/done the same? When contacted, the MUBI team (IIRC) basically said they're not interested in that as of now.



Little known fact, but we did start out with the goal of amassing a large library, and we even put significant effort with some former Netflix-prize winners in building a recommendation engine.

The first big problem, which you see on Netflix as well, is that a recommendation engine really falls off in efficiency when your selection is thin, and the original dream of streaming every film is never going to happen because of a price squeeze from the content side (hence Netflix moving into production and focusing on exclusive content).

The second, perhaps bigger problem (particularly for our audience) is that there is only so much metadata that you can pump into a film database that will tell you categorically if someone wants to watch it. People like to be delighted by discovering a new film, not because it is somehow akin to 20 other films they liked, but because of where they are in their life now, and the contemporary meaning of that film. I know curation has become possibly the most annoying buzzword in SV circles lately, but it has been our approach since 2007, real actual curation by human beings—not algorithms that purport to be "curatorial" on a pitch deck.


Regarding your second point, I really understand this position - fair enough (re. zeitgeist and personal point in life of someone looking for films and rating films, etc.) I just wonder if it wouldn't be interesting to spend some time looking at recommendation systems which are not just simply about intersecting users' votes and presenting something "just what you've recently liked." But this would be more of a playful experimentation which is more hobbyist in nature, for sure.


What, so like, "as an early thirty-something who just found out you're about to become a father, you may enjoy films X, Y, Z"? To really get people's life chapters and such you'd have to have massive metadata about them beyond the scope of a video service. Google/FB could do it, though.


It was more a comment about how considerations such as this make collaborative recommendation systems precisely less useful. Regarding rec.systems that move beyond naive rating/like intersections between users (such as e.g. Jaccard similarity index), I think there could be interesting algorithms developed that make use of assigned/curated item tags together with collaborative filtering/recommendations (e.g. something that last.fm has considered (or has been considering) - there's an entire field of research for just music recommendation systems.)


I would pay $10 a month for a recommendation engine only. I'm able to find the actual video files easily enough. (And I often do so, even with Netflix has it, as Netflix's player sucks, quality is iffy, subtitles are spotty, etc.)



> "Think about your own Netflix experience and how frustrating it is — how long it takes you to find a film that you want to watch," says Cakarel. "It doesn't work. It categorically doesn't work."

I don't really think this describes my Netflix experience well enough to be "categorically" true.


Netflix recommendations work really well for some people, not so much for others.

Even after using it for a few months, I found the success rate not much better than just trawling through imdb new releases lists. Yet a lot of my friends say they the recommendations are pretty spot-on and usually something they enjoy.


Netflix recommendations have essentially never worked for me. It occasionally identifies a category I might be interested in, but essentially drops the ball on actual content. I think the best film I've seen recommended in a genre was Charlie Victor Romeo.

There are still some excellent movies (Primer, Metropolis, Ronin, Barton Fink, The Man Who Wasn't There, etc...), particularly classics. And the documentaries are probably under appreciated (Genius on Hold, Cave of Forgotten Dreams, both Cosmos, etc...). So, someone is probably paying attention. I have to randomly search for specific things from time to time though. Sometimes sites like https://reddit.com/r/bestofnetflix help.

One of the bigger problems I've noticed is that some movies tend to be cut short though. It's almost never a director's cut.


In reply to throwaway7767, who I can't seem to reply to directly, I think recommendations depend how thorough people are about using unique accounts for each user in the house.

I ended up just creating an Everyone profile after being asked once too often why Netflix wasn't keeping track of where people had got to in a series, and the recommendations now are utter garbage, as you'd expect when looking for the intersection of Breaking Bad, Period Dramas, and Fireman Sam. I have a feeling they'd be much better if each person's tastes were segregated.


The www.netflixprize.com data from 2009 is quantitative evidence about whether the recommender works (it is much better than random). I'm open to the idea that a curated collection might delight more people, but in my experience and based on this data, I can't agree that Netflix's algorithm is useless. Personally I find it's the unavailability of streaming content that makes Netflix frustrating.


Please describe some neat things you'd like to do.


For starters, I'd like to

* index every piece of data properly, for easy reference to (a) films, (b) user ratings (per user per film) and (c) users (seeing the latter as simply sets of ratings on films)

* then run a few collaborative recommendation algorithms on the indexed dataset - basically taking things from the scikit-learn python package and running them

* then see (naively and heuristically) whether any recommendations for myself and for a friend who's interested in this make sense

* then do a more proper machine learning dataset split into "teaching dataset" vs. "testing dataset", to see if any of those algos can predict what films a particular user would be interested in watching and how much they'd like them

* and then to move onto something which may (I think) provide insights for this particular dataset, such as e.g. attempting to classify users into clusters, to see if there are any more homogeneous clusters of users, with some users acting as connecting "bridges" between clusters; I'd start with the kNN algorithm here, for example

* it would also be interesting to attempt to classify films as well, and see whether some curious non-intuitive/non-stereotypical clusters emerge (something beyond well-known genre categories, etc.) I'm not sure what I'd be looking at in particular, but basically this assumes that we really do trust the collective ratings of users. The latter may be very problematic - for starters, I'm quite sure it'd be difficult to attempt to "normalize" the different ways users vote (4/5 rating is one thing for one person and another for a different person - of course collective averages may help here, and that's one of the things with this MUBI community in particular that drew my initial attention: overall (subjectively and with bias) it seems that the overall community rates films quite responsibly and with a degree of (let's say) signal.)

* visualize classified clusters, include sliders which alter parameters for classification (including kNN's simple "n", but beyond that, too), etc.


What you want isn't an API, it's a data dump... thank you though. Definitely helpful to hear.


Hm, I suppose you're right, yes.

Could probably come up with interesting things to use API endpoints for, too. :) but my initial thoughts were about something else, I agree.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: