Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

TFA lists WebKit as a project that "does it wrong".

The author should read https://webkit.org/blog/6161/locking-in-webkit/ so that they understand what they are talking about.

WebKit does it right in the sense that:

- It as an optimal amount of spinning

- Threads wait (instead of spinning) if the lock is not available immediately-ish

And we know that the algorithms are optimal based on rigorous experiments.



The author (me) actually read this long ago

> - It as an optimal amount of spinning

No it isn't, it has a fixed number of yields, which has a very different duration on various CPUs

> Threads wait (instead of spinning) if the lock is not available immediately-ish

They use parking lots, which is one way to do futew (in fact, WaitOnAddress is implemented similarly). And no if you read the code, they do spin. Worse, they actually yield the thread before properly parking.


> No it isn't, it has a fixed number of yields, which has a very different duration on various CPUs

You say this with zero data.

I know that yielding 40 times is optimal for WebKit because I measured it. In fact it was re-measured many times because folks like you would doubt that it could’ve optimal, suggest something different, and then again the 40 yields would be shown to be optimal.

> And no if you read the code, they do spin. Worse, they actually yield the thread before properly parking.

Threads wait if the lock is not available immediately-ish.

Yes, they spin by yielding. Spinning by pausing or doing anything else results in worse performance. We measured this countless times.

I think the mistake you’re making is that you’re imagining how locks work. Whereas what I am doing is running rigorous experiments that involved putting WebKit through larger scale tests


>> No it isn't, it has a fixed number of yields, which has a very different duration on various CPUs

> You say this with zero data.

Wouldn't the null hypothesis be that the same program behaves differently on different CPUs? Is "different people require different amounts of time to run 100m" a statement that requires data?


>You say this with zero data.

Or so you assume

> Spinning by pausing or doing anything else results in worse performance. We measured this countless times.

And I've seen the issue in hundreds of captures using a profiler. I suppose we just have a different definitions of the what "worse performance" is.

> Whereas what I am doing is running rigorous experiments that involved putting WebKit through larger scale tests

Or perhaps the fish was drown in the stats, or again different metrics.


> Or so you assume

You're not including data in your discussion of this topic. Your post included zero data.

My post on WTF locks has tons of data.

So, I'm not assuming; I'm observing.

> And I've seen the issue in hundreds of captures using a profiler. I suppose we just have a different definitions of the what "worse performance" is.

Nobody cares what you saw in the profiler.

What matters is the performance users experience.

By any metric of observable performance, yielding is the optimal way of spinning.


I guess you mean this regarding spin locks? https://web.archive.org/web/20250219201712/https://www.intel...

The direct link to Intel 404s.


For reference, golang's mutex also spins by up to 4 times before parking the goroutine on a semaphore. A lot less than the 40 times in the webkit blogpost, but I would definitely consider spinning an appropriate amount before sleeping to be common practice for a generic lock. Granted, as they have a userspace scheduler things do differ a bit there, but most concepts still apply.

https://github.com/golang/go/blob/2bd7f15dd7423b6817939b199c...

https://github.com/golang/go/blob/2bd7f15dd7423b6817939b199c...


The guy you relied to wrote the locking code. If you’re so certain they’re doing it wrong, would it not be easier to just prove it? It’s only one file, and they already have benchmarking set up


I mean my "No it isn't, it has a fixed number of yields, which has a very different duration on various CPUs" can be verified directly by having a look at the table in my article showing different timings for pause.

For the yield part, I already linked to the part that shows that. Yes it doesn't call yield if it sees others are parked, but on quick lock/unlock of threads it happens that it sees nobody parked and fails, yielding directly to the OS. This is not frequent, but frequent enough that it can introduce delay issues.


This is an incredible blog post. Super educational, and I think directly applicable to my work. Thanks for sharing!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: