This could actually be shortened, maybe simplified, significantly. For example t...

This could actually be shortened, maybe simplified, significantly. For example there is a lot of redundancy in the layers and that could be pulled out into a function per block. This is what I often do with deep networks as it helps avoid errors in code and shortens everything significantly, at the potential expense of being able to grok it initially as quickly.

But, many DNN concepts (and ML concepts themselves) can be described with a few lines of pseudocode. CNNs, RNNs, etc. can all be described in a few lines.

It's really quite amazing, most of the work goes into first creating the net work from theory, then training and tuning it until you get good results.