I am thinking long term, as neural nets scale up, we'll have to move compute int...

I am thinking long term, as neural nets scale up, we'll have to move compute into memory. The other way around is problematic, small caches don't work well with neural nets. In fact the fastest Transformer (FlashTransformer) is based on principled usage of the SRAM cache because that's the bottleneck.