Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sadly V8's dev team has yet to address the other problem that "use asm" solves - falling out of JIT sweet spots due to broken heuristics. If you maintain a large machine-generated JS codebase (like I do by proxy, with my compiler), it is a regular occurrence that new releases of Chrome (and Firefox, to be fair) will knock parts of your code out of the JIT sweet spot and suddenly start optimizing other parts. Sometimes code that was fast becomes slow for no reason, other times some slow code becomes fast and now you look at profiles and realize you need to remove caching logic or that your code will be faster if you remove an optimization.

The arms race never ends, and keeping up with it is a full-time job. asm.js fixes this, by precisely specifying the 'sweet spot' and giving you a guarantee that if you satisfy its requirements, all your code will be optimized, unless the VM is broken. This lets you build a compiler that outputs valid asm.js code, verify it, and leave it alone.

These days I don't even have time to keep up with the constant performance failures introduced by new releases, but JSIL is a nearly two-year-old project now and they cropped up regularly the whole time. Ignoring the performance failures isn't an option because customers don't want slow applications (and neither do I).



> The arms race never ends, and keeping up with it is a full-time job. asm.js fixes this, by precisely specifying the 'sweet spot' and giving you a guarantee that if you satisfy its requirements, all your code will be optimized, unless the VM is broken.

I'm not convinced that 'use asm' helps with that at all. Static compilation of the sort that asm.js gives you is also full of heuristics. Even in C, you've got that sort of effect.

Tweak the inlining cost model, and suddenly your frequently called accessor function now has function call overhead, slowing everything down. Or maybe it decides that the cost of padding vectors is too high, and autovectorization shouldn't be applied now. Or other dozens of similar heuristics.

It doesn't matter if you have a 'use asm'; tweaking the compiler will change the heuristics and boot you out of the sweet spot.


In theory, you're right - even C compilers make decisions about what to optimize and how. But in practice, the variance in the performance of C compiler outputs to dynamic language VMs is enormous.

With C you might get a 2-3x difference in common cases, but in a JS engine if it decides incorrectly to stay in the interpreter, you've lost 100x. If it enters a vicious cycle of deopts, it can be even worse. If it does a long GC all of a sudden, you can lose 300ms in an unpredictable way.

Again though, in principle all this is solvable in dynamic VMs. We know that theoretically and based on logical arguments; it would be great to see it proven in practice, but it hasn't yet.


Yes, but that isn't changing continuously.

You get one build that is targeted to a specific compiler. Sure, upgrading the compiler might change what you need to optimize, but that happens when you, the developer, decides to do that, as opposed to when the user decides to upgrade their browser.

You can say "the JS sitting on the server, being served to clients, is fast with asm.js" That won't change until you re-compile it. With the JS jit, you can't know that.


> "the JS sitting on the server, being served to clients, is fast with asm.js"

Which asm.js compiler? The one in Firefox today? The one tomorrow with different inlining heuristics? The one that Opera decides to put in? The one for ARM or for x86?


A broken inlining heuristic is not going to have the kind of performance consequences for an asm.js app that you get when you get dropped into the interpreter/baseline JIT in a javascript app today.

A function not getting inlined means the addition of function call overhead, and that's it, in most cases. Dropping into an interpreter or baseline jit can literally produce a 100x slowdown, or worse.

Any conforming asm.js implementation that does AOT compilation will produce predictable performance. Yes, individual implementations will differ, but the same is true when you compile applications against MSVC, Clang, and GCC. You don't see anyone arguing that Clang and MSVC should have standardized inlining heuristics, do you?

It's important to remember, also, that when we say things like 'JIT sweet spot' we're not just referring to what code you wrote. It also matters what code you run, in which order, with which arguments. Something as simple as changing the order in which you call some functions (without changing the actual code in those functions) can cause them to deopt in modern runtimes because the heuristics are so sensitive to minor changes in data. Those kind of variances can be caused by something as simple as a HTTP request taking a bit longer to complete and changing the startup behavior of your app.


Does that happen measurably often on hot paths in reality, or is it just a theoretical worry? Because it sounds like you're describing a JIT with broken heuristics.


Yes, it happens measurably often on hot paths in reality. I have test cases that produce it against modern builds of v8 and spidermonkey. Naturally, they are not all real-world examples, but they're based on problems from real apps.


I know it's probably a little much to ask but do you have any examples of the types of performance regressions you experienced over the last few years within v8 that hit you the hardest?


The vast majority of them are cases where some code starts getting deopted by v8 and dropping into the low-performance interpreter-equivalent mode in the runtime. There are hundreds of ways to trigger this and the list changes regularly.

In practice when it's big enough to demonstrate with a test case, the reports look like this:

http://code.google.com/p/chromium/issues/detail?id=261468

"It's slower than it used to be, I can't tell why."

Part of the problem is that the profiling tools aren't precise or reliable enough to show you what got slower (in fact, opening them changes performance characteristics dramatically). So I end up having to attach a debugger and launch Chrome with special undocumented v8 flags to get it to dump things like traces of optimization attempts and failures, and then try to figure out what cryptic deoptimization causes like 'check-maps' mean in practice for my code.

There have been a few specific cases where the VM started relying on particular characteristics of objects, and that caused my code to deopt. I remember that at some point it became the case that putting a method named 'apply' or 'call' on any object will cause calls to those methods to become incredibly slow in V8; for some reason at that time X.apply() and X.call() were special-cased to always assume that they were Function.apply or Function.call, so if they weren't the optimizer bailed out. Funnily enough, this also applied to Function.bind() - if you called .bind() on a function, the result would have special .apply() and .call() methods so using the result deoptimized your code. I don't know if they ever fixed that problem. I renamed all my methods to avoid it and removed most uses of Function.bind.


> In practice when it's big enough to demonstrate with a test case, the reports look like this:

I would like to note that unfortunately there does not seem to be a single V8 person on that bug. I highly recommend filing all JavaScript related issues (including performance ones) directly on V8 bug tracker at http://code.google.com/p/v8. This guarantees that these issues will be immediately seen by the V8 team.

[to be completely precise anything that is related to correctness or performance of the language described in ECMA-262 can go directly to V8 issue tracker. On the other hand things like DOM APIs are implemented in Blink bindings so it should go into Chromium one. If highly in doubt file Chromium issue :-)]

Do you still observe performance issues reported in that bug? If so I will undo WontFix and CC relevant folks.

> for some reason at that time X.apply() and X.call() were special-cased to always assume that they were Function.apply or Function.call

To the best of my knowledge there was never an assumption like this in V8. Optimizer would detect Function.prototype.apply using a token attached to the closure value itself. Optimizer still does not optimize Function.prototype.call in any special way.

It would be quite interesting to figure out what was going wrong for you and whether it is fixed or not. One possibility is clash of the CONSTANT map transitions, but honestly I don't see how it can occur.


Yeah, I agree that V8 is probably the right tracker to go with. It's a mess since most of my test cases are not 'just JS' that can run in the d8 shell; they are applications and the perf issues appear when running the complete application. I'll take a look at the bug again and see if there's still a regression; I haven't checked in a while. I don't really have a way to pull out old Chrome builds, though...

The apply/call thing is from an old email thread with you, so I must have misunderstood. My understanding still led to performance improvements though, so that's quite mysterious :)


> The apply/call thing is from an old email thread with you, so I must have misunderstood. My understanding still led to performance improvements though, so that's quite mysterious :)

Ah, I remembered the thread. If I am not mistaken the problem was that you were adding properties to function objects and those function objects were flowing into f.apply(obj, arguments) site making it polymorphic. At this point Crankshaft would fail to recognize this apply-arguments pattern and disable optimization for this function.

Situation was similar to http://jsperf.com/apply-argument-poly

(microbenchmark warning :-))


Shouldn't someone on the V8 team watch Chromium's "Cr-Content-JavaScript" bugs, especially bugs with the word "performance" in the summary?


I think what happened here is that Cr-Content-JavaScript is the wrong/outdated label, so V8 team was not automatically CCed. It should be Cr-Blink-JavaScript these days.

I have pinged relevant people.


V8's 'interpreter equivalent' mode is non-crankshaft JIT. It's not interpreted in the slightest.


Yes, that is why I said 'interpreter equivalent' and not 'interpreter'. What matters is that the performance is awful because it's the lowest common denominator :) It is faster than an interpreter!


Is it faster than JSC's new interpreter? That thing is quite fast.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: