Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I was just browsing through the classic "Mining of Massive Datasets" book (which is free!) when I noticed this apt passage in its introduction that explains the difference between data mining and machine learning:

http://infolab.stanford.edu/~ullman/mmds.html

> There are some who regard data mining as synonymous with machine learning. There is no question that some data mining appropriately uses algorithms from machine learning. Machine-learning practitioners use the data as a training set, to train an algorithm of one of the many types used by machine-learning prac- titioners, such as Bayes nets, support-vector machines, decision trees, hidden Markov models, and many others.

There are situations where using data in this way makes sense. The typical case where machine learning is a good approach is when we have little idea of what we are looking for in the data. For example, it is rather unclear what it is about movies that makes certain movie-goers like or dislike it. Thus, in answering the “Netflix challenge” to devise an algorithm that predicts the ratings of movies by users, based on a sample of their responses, machine- learning algorithms have proved quite successful. We shall discuss a simple form of this type of algorithm in Section 9.4.

On the other hand, machine learning has not proved successful in situations where we can describe the goals of the mining more directly. An interesting case in point is the attempt by WhizBang! Labs1 to use machine learning to locate people’s resumes on the Web. It was not able to do better than algorithms designed by hand to look for some of the obvious words and phrases that appear in the typical resume. Since everyone who has looked at or written a resume has a pretty good idea of what resumes contain, there was no mystery about what makes a Web page a resume. Thus, there was no advantage to machine-learning over the direct design of an algorithm to discover resumes.

http://infolab.stanford.edu/~ullman/mmds.html



Will you need to change that definition if I show you a machine learning algorithm capable of significantly outperforming the best human algorithms on the resume classification problem?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: