Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Metadata is in fact useful, though not the metadata that you might expect. One of the biggest wins many teams made was when they started ranking similarity based on edit distance of titles.


levenshtein distance for predictions? haha.

I'd be really curious to know which teams in particular use movie metadata? Yehuda Koren (the first commenter in the blog post) has explicitly stated many times that in his humble opinion movie titles and any non-explicit info has been useless.

BellKor, BigChaos, Gravity, and Gavin Potter (just a guy in a garage) are going to be presenting at Yehuda's KDD workshop next week. I'm sure other teams will also be represented. I'll ask them if they use movie metadata, and I'm pretty sure the answer will be no.


Edit distance of titles?! Do you have a source? I'm very curious about how and why that would help.


Indiana Jones and the _______________.


"Heat" and "WALL-E" have a shorter edit distance between them than any Indiana Jones movies.


not if you normalize by e.g. minimal string length.


Here's a paper on the BellKor solution, from one of the top teams:

http://research.att.com/~volinsky/netflix/ProgressPrize2007B...


Yehuda later wrote http://glinden.blogspot.com/2008/03/using-imdb-data-for-netf... and http://hunch.net/?p=331 that using movie metadata has produced no measurable improvement in RMSE.


The crux of the argument though, is that if you have a strong CF model with many many ratings, you don't seem to get much benefit with their approach (linear combination of models). That doesn't mean that metadata can't be useful with a different approach. It also doesn't mean that metadata isn't useful for sparse data: in fact, it's incredibly useful, because you don't have much of anything else.


I cannot dispute that metadata can be useful. But it appears, at least for prediction tasks similar to the prize, that an ounce of weak or strong explicit user input is worth a ton of rich implicit data (including item metadata).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: