Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I didn't dig too deep into this blog, but right off the bat I suspect that there is caching going on. Since this is a virtualized environment that is not surprising, you don't know if the entire volume you are working with is being cached in the hypervisor or possibly the disk controller or disk itself.

You don't get fraction of a millisecond random reads from spinning disks without caching... period. So whatever you think your measuring you aren't.

My experience with performance measurements is that the vast majority of people don't measure what they think they are measuring or don't measure what is actually relevant for the use case (and all related use cases that need to be considered). And if you don't know what the expected result is they can be equally useless because you can't know that your results don't fit reality or the speed of light or whatever.

I wish I had a great answer for how you can get started doing performance work, but it starts with understanding the orders of magnitude of various operations. Flash seeks are hundreds of micros for crappy flash, spindles are ~4 milliseconds and that is with fast spindles. If you are working in memory you should know the difference between instructions (for instructions branching and stalled execution vs continuous execution) and cache misses and the various kinds of cache misses (different tiers, remote NUMA) as well as what happens when you have contended mutual exclusion or CAS primitives.



I agree with your general assessment and skepticism of performance tests, particularly on top of virtualization.

Minor point though: Digital Ocean at least advertises SSD for their servers, not spinning disk. Fraction of a ms random reads seem within the realm of possibility.


I came here to say this, glad to see other hacker news people picking up on the impossible disk seek times as well.


Looks like -D will make ioping to direct


Won't disable neither disk cache nor raid controller cache though.


If you're using a RAID controller that's worth what you paid for it, then it has disabled the on-board cache on the disks it's managing. Otherwise, the following scenario becomes possible:

  1. OS issues write to RAID HBA, write is stored to NVRAM (or battery-backed RAM on older cards).
  2. RAID HBA issues write command to disk.
  3. Disk accepts write into onboard buffer, acknowledges it as committed.
  4. RAID controller releases cached pages.
  5. Power loss.
...and you've lost data.

EDIT: Notice that the write never actually touches disk in this scenario. Once a disk drive acknowledges a write, the RAID controller releases the data that was "written" from its cache. Disk writes take milliseconds, while memory writes take microseconds, and usually just nanoseconds. That leaves a relatively huge window during which the power could go out, but before which a write has been safely persisted to disk.


The on disk cache can still cache data for reads and they actually may actually cache data for writes as well if they are using write barriers.

There is no reason for the RAID controller to not let the on-disk cache and scheduling work while it is doing writeback, it only needs acknowledgement at the end before it flushes it's non-volatile cache.

This could also be something that I don't know about. Maybe in the world of disks write barriers are < some disable write caching command? Can controllers issue writes large enough to make up for the lack of caching + write barriers? I have no idea, how SATA works.

Again this gets into why it is hard to know what to expect from a disk IO benchmark. You have to know how the caches are operating and there are many of them and configuration can vary.


RAID controllers will usually have backup batteries for just this reason.


Yes, or NVRAM, exactly as step 1 in my scenario mentions.

The problem is that the disks they're managing don't. (EDIT: barring SSDs with supercaps, but that's an entirely other discussion.)

If a write has been accepted by the disk and acknowledged as written — but in reality has only been stored in the disk's on-board cache — and you suffer a power loss before the write can be flushed to permanent storage (be it spinning rust or NAND cells), then you have lost that data.

This is exactly why a RAID controller worth using will disable a drive's onboard cache. Because disks lie.

Was my first comment somehow unclear?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: