Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> memory mapped i/o is orders of magnitude faster than sequential calls to read()

That’s not something I’ve generally seen. Any source for this claim?

> You can have gigabytes of JSON to parse, and the JSON might be available over the network, and your service might be running on a small node with limited memory. Memory mapping here adds quite a lot of latency and cost to the system

Why does mmap add latency? I would think that mmap adds more latency for small documents because the cost of doing the mmap is high (cross CPU TLB shoot down to modify the page table) and there’s no chance to amortize. Relatedly, there’s minimal to no relation between SAX vs DOM style parsing and mmap - you can use either with mmap. If you’re not aware, you do have some knobs with mmap to hint to the OS how it’s going to be used although it’s very unwieldy to configure it to work well.



Experience? Last time I made that optimization it was 100x faster, ballpark. I don't feel like benchmarking it right now, try yourself.

The latency comes from the fact you need to have the whole file. The use case I'm talking about is a JSON document you need to pull off the network because it doesn't exist on disk, might not fit there, and might not fit in memory.


> Experience? Last time I made that optimization it was 100x faster, ballpark. I don't feel like benchmarking it right now, try yourself.

I have. Many times. There's definitely not a 100x difference given that normal file I/O can easily saturate NVMe throughput. I'm sure it's possible to build a repro showing a 100x difference, but you have to be doing something intentionally to cause that (e.g. using a very small read buffer so that you're doing enough syscalls that it shows up in a profile).

> The latency comes from the fact you need to have the whole file

That's a whole other matter. But again, if you're pulling it off the network, you usually can't mmap it anyway unless you're using a remote-mounted filesystem (which will add more overhead than mmap vs buffered I/O).


I think you misunderstood my point, which was to highlight exactly when mmap won't work....


In my experience mmap is at best 50% faster compared to good pread usage on Linux and MacOS.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: