You mean moving memory closer to compute. And that's happening already in lots of places, even though it brings increased complexity due to having to account for NUMA.
I am thinking long term, as neural nets scale up, we'll have to move compute into memory. The other way around is problematic, small caches don't work well with neural nets. In fact the fastest Transformer (FlashTransformer) is based on principled usage of the SRAM cache because that's the bottleneck.