Well its a HN comment not a scientific communication. Scenario (i) is mostly con...

Well its a HN comment not a scientific communication.

Scenario (i) is mostly considered uninteresting and "solved" because it assumes that the particular stochastic process generating the data is known (this was the pre-Vapnik model). In most practical cases, however, one does not know the corresponding measure from which the data was generated (and will be generated in future). The old school thinking was that one should estimate the generating distribution, and then solve for the optimal decision, based on the estimate. Vapnik showed that it is far better to do it directly without bothering to estimate the distribution which in itself is a non-trivial problem.

So (ii) is the more interesting case that machine learners care to handle and Vapnik's was one of the first breakthrough fundamental result in this formalism. It applies to all algorithms that does something called structural risk minimization, not just one specific algorithm. His bound applies to all generating distributions, all samples, and all structural risk minimization algorithms ! However in this set up we still need to assume independence. Some form of independence, even in some weak form needs to be present. But it is quite remarkable that you can give guarantees on performance regardless of what actual distribution generated the data. I should also mention Leslie Valiant. Lookup PAC learning you will understand the basic premise of this style. One genuine criticism is that it is overly conservative (and it has to be, because it is giving guarantees that applies for the worst distribution, worst training sample, worst algorithm...).

(iii) Removes the need for independent draws. If you want to read up on these, search for "regret minimization." In this setup the guarantees hold for all sequence of data, even if adversarially generated by someone who knows the algorithm you will employ. Some geometric assumptions are necessary on the domain and some function analytic ones on the function that you are trying to minimize (typically convex. Non-differentiable functions are allowed).

   -- Shalizi's Review --

Shalizi writes good stuff but this review of Vapnik's book is surprisingly mean spirited. The field of course has progressed after Vapnik and there are legitimate criticisms that one can raise, but the review exhibits bad form, I think. Prior to Vapnik there were very few results that one could state outside of the statistical physics community (that worked with strong assumptions, typically Gaussian noise etc)

    -- Aside --

No you did not say the word "yo bitches" but that was the attitude you demonstrated. Before casting judgments as you did at least make an effort to familiarize yourself with the background. And that is exactly what it is, background. A machine learner will not even mention these things because you are expected to know these things if you are in this field.

    -- Re: Analysis being omitted  --

Think of it this way, one doesn't really need to know precise chemistry of hydrocarbon combustion to drive a car, conversely, knowledge of such chemistry alone is useless unless it is teamed up with good engineering and in this case algorithm. There is a huge difference between showing that minimizing a particular function has nice properties and actually coming up with a piece of code to efficiently minimize such a function, not just in theory where constants can be conveniently swept under the rug, but also in practice. So dont be too quick to dismiss the "naive coders".

Most importantly, practice, almost always precedes explanation and analysis in engineering. Usually practitioners find the algorithms in their hacky ways. The analyst/theorist smoothens the kinks and explains why the method works so well after it has been demonstrated to work so well-- not always, but almost always.