It looks to me like most of the space is taken up with a plot of the sine function and the python code to generate the plot. Maybe it's a little fluffy, but it might be good for somebody self-taught, or a young person learning all of this stuff for the first time and wanting a quick reference.
There's a lot to critique but this is a really weird one (page 49 if anyone is following). The whole thing is 5 sentences and all the space is because a diagram and code block. The 5 sentences should be the thing to complain about
I only skimmed but I get the impression that sort of thing is common in the text.
I think it's got the problem that deep learning "isn't really math" - in the sense that deep learning using indeed very elaborate computational structures that can be specified mathematically but it doesn't prove theorems about them - not theorems that characterize what's happening. The theorems are just hints about what might be happening.
The key deep learning knowledge is in papers that basically only show that X approach works best on Y (plus maybe some suggestive theorem) - for example Attention Is All You Need.
> The theorems are just hints about what might be happening.
Isn't this true everywhere? Certainly it is just, in the words of Asimov, the relativity of wrongness. I mean even physics is "just a hint" despite being an incredibly strong one. I think maybe a lot of people might not agree but I think a lot of people aren't as aware of all the research that still goes on in every day physics. Like studying ocean waves/currents, wind, explosions, materials, and so much more that is not quantum or relativity. But quantum and relativity get far more attention, so perception bias.
> The key deep learning knowledge is in papers that basically only show that X approach works best on Y (plus maybe some suggestive theorem)
I very much disagree. Those are certainly the most visible, but not the most foundational. Actually I believe this approach is holding us back, and diffusion is my best example of this. Big steps in diffusion and GANs were made around the same time, but GANs were easier to implement and less resource heavy. Sohl-Dickstein certainly is a key player despite Ho being more well known. Same with Aapo Hyvarinen. I think we got too captivated by GANs that it made it harder to publish anything else. I've had some experience with this personally, where I've given up trying to publish in Normalizing Flows because reviewers will ask why my works are not better than GANs (or now diffusion) despite being better than other Flows or even got this on a distillation based paper (multiple times before we abandoned it). If there's too heavy of concern on metrics (not using as guides/hints, but as targets) then how can other things advance in a normal way? You'd have to take leaps and bounds instead of incrementalism (which we've established is fine for popular paths. Former GANs, now diffusion). Leaps and bounds because the community size is exceptionally disproportionate and that is far more time and research being put into one than another. I'd argue that we have pretty good evidence to believe a hypothesis that counterfactually diffusion would have emerged as a strong player sooner if this weren't how we measured publication criteria (SOTA chasing). I believe this problem has only become worse. But this is how technology always advances, it isn't one technology getting better and better, but we see the composition of different technologies. Almost always where the replacement starts out as significantly worse than the existing status quo. So I'd argue we're leaving a lot of good work on the table by doing this. Certainly we have enough people working in ML that we can adequately do both, which is certainly much more optimal. You need both, but problem is we just compare <new thing, or not as established thing> to <current popular and SOTA thing> as if benchmarks are the only component of the story here.
> > The theorems are just hints about what might be happening.
> Isn't this true everywhere? Certainly it is just, in the words of Asimov,
> the relativity of wrongness. I mean even physics is "just a hint" despite
> being an incredibly strong one.
Well, sure, there is a relativity wrongness but the relativity to a context and in a given context, an agent (say you or I) has to judge whether the relative difference in the wrongness of two things means they're the same or they are different. In the context of the ideal, the laws of physics are limited. Relative astrology or other new age theories they're essentially true.
So, expanding my point, relative to many contexts, the distinction between a system you can reason about and one you can't tends to be a big distinction, even if you have mathematical analogies. A rocket can be send to the moon because we can reason about the laws of physics. A self-driving car, after also many years of trying and an interactive map etc, can often but not always get to the other side of town.
>...where I've given up trying to publish in Normalizing Flows because reviewers will ask why my works are not better than GANs...
Your efforts seem like the exception that proves the rule.