We tried to communicate the key ideas in the video released with the blog post. ...

choppaface · on Feb 11, 2020

Oh sorry I didn't make it to the video because the blog post intro made me bounce straight to the paper. I agree the video is a big help versus what's given in the paper.

It looks like your approach plays the 'pebble counting' game described in the OpenAI article I linked. Or maybe you'd like to explain what's different.

What would really help in the video (and paper) is a grounded example (like Resnet10 or AlexNet or just a 2-layer MLP) and drawing the connection between GPU buffers and layers. I feel the video covers details of the memory savings in way too much precision while the intuition behind the method (and how it translates to a graphical model of a NN) is essentially absent.