To further back this up with some math - L2 cache hits (what you'll hit on a L1D cache miss caused by clearing L1D cache) - are still in the mid/low single digit nanosecond ranges[1]. Say flushing the L1D causes another 1000 L2 cache misses[2] - maybe we got really lucky and the next thread was hashing all the exact same data at the exact same time or something equally unlikely? That'd still put us in the mid/low single digit microseconds range. On par with DDR4-1600 (12.8GB/s)'s 3.75us to read 48KB [3][4]. Let's more than double that and say it takes 10 microseconds = 0.01 milliseconds = 0.01% of 100 milliseconds.
Any noticable perf overhead is going to be from the act of cache flushing taking some super slow path for some reason, or much more frequent context switching than 100ms timeslices.
Any noticable perf overhead is going to be from the act of cache flushing taking some super slow path for some reason, or much more frequent context switching than 100ms timeslices.
[1] https://stackoverflow.com/a/4087331
[2] 1000x 32-128B cachelines = 32-128KB, definitely in the ballpark to completely refill a 48KB L1D cache.
[3] https://en.wikipedia.org/wiki/DDR4_SDRAM#Modules
[4] https://www.wolframalpha.com/input/?i=48KB+%2F+12800+MB%2Fs