Can you explain how this is possible to do? Sorry I haven't gotten this low level with any of these models before and I'd really appreciate how to understand this.
To run inference, you need to load a model from disk into RAM. Usually the model is written in a disk format that is convenient but needs to be parsed at runtime into a RAM-resident C application data structure.
In this case, it looks like jart@ modified malloc to capture the memory generated by the loading process and serialized that to disk. When you run, the application calls mmap to make a virtual memory association with the bytes on disk- so any time you access RAM and it's not yet loaded, ti gets loaded from disk. At that point it gets saved by the kernel in a page cache and since the files on disk don't change, those pages can stay in memory longer than the process. So when the process restarts, all those RAM requests are immediately mapped to already-cached virtual memory, rather than reading from disk.
The inference library here supports a data pointer that would point to the memory mapped location.
This is faster than relying on the kernel's disk read cache; in that case, you'd still need to convert the data from the disk format to the in-memory format.
Normally the data build process is run as an external program that writes the mmap-ready structure to disk (an example is the BLAST program which writes the DNA sequence data into an index structure that is mmapped at runtime). But in this case ti looks like using an instrumented malloc() helps simplify the process of building the disk structure.
Thank you for taking the time to write this out. Very helpful for understanding. I remember using malloc to build out C data structures in my coursework but I must admit I haven't really done much practical work at this level. Thanks again, you are a scholar.
This has nothing to do with the models, just standard *nix stuff. If you mmap the file readonly the pages can be shared by multiple processes without duplication since they are guaranteed to be the same.