Maybe this is just my soft, theory-laiden pure math brain talking, but I'd be a lot less impressed with machine learning if we had a decent formal understanding of them. As is they're way weirder than I think most engineering types give them credit for. But then again, that's how I feel about a lot of applied stuff, it all feels a little magic. I can read the papers, I can mess around with it, but somehow it's still surprising how well it can work.
Ultimately it comes down to gradient-based descent (which is pretty magical in its own right), but what's most surprising to me is that the loss landscape is actually organized enough to yield impressive results. Obviously the difficulties of training large NNs are well-documented, but I'm surprised it's even that easy