Please describe some neat things you'd like to do.

wfn · on Oct 8, 2015

For starters, I'd like to

* index every piece of data properly, for easy reference to (a) films, (b) user ratings (per user per film) and (c) users (seeing the latter as simply sets of ratings on films)

* then run a few collaborative recommendation algorithms on the indexed dataset - basically taking things from the scikit-learn python package and running them

* then see (naively and heuristically) whether any recommendations for myself and for a friend who's interested in this make sense

* then do a more proper machine learning dataset split into "teaching dataset" vs. "testing dataset", to see if any of those algos can predict what films a particular user would be interested in watching and how much they'd like them

* and then to move onto something which may (I think) provide insights for this particular dataset, such as e.g. attempting to classify users into clusters, to see if there are any more homogeneous clusters of users, with some users acting as connecting "bridges" between clusters; I'd start with the kNN algorithm here, for example

* it would also be interesting to attempt to classify films as well, and see whether some curious non-intuitive/non-stereotypical clusters emerge (something beyond well-known genre categories, etc.) I'm not sure what I'd be looking at in particular, but basically this assumes that we really do trust the collective ratings of users. The latter may be very problematic - for starters, I'm quite sure it'd be difficult to attempt to "normalize" the different ways users vote (4/5 rating is one thing for one person and another for a different person - of course collective averages may help here, and that's one of the things with this MUBI community in particular that drew my initial attention: overall (subjectively and with bias) it seems that the overall community rates films quite responsibly and with a degree of (let's say) signal.)

* visualize classified clusters, include sliders which alter parameters for classification (including kNN's simple "n", but beyond that, too), etc.

Arnt · on Oct 8, 2015

What you want isn't an API, it's a data dump... thank you though. Definitely helpful to hear.

wfn · on Oct 9, 2015

Hm, I suppose you're right, yes.

Could probably come up with interesting things to use API endpoints for, too. :) but my initial thoughts were about something else, I agree.