// We play this game because we want this to be callable even from places that
// don't have access to CallFrame\* or the VM, and we only allocate so little
// memory here that it's not necessary to trigger a GC - just accounting what
// we have done is good enough. The sort of bizarre exception to the "allocating
// little memory" is when we transfer a backing buffer into the C heap; this
// will temporarily get counted towards heap footprint (incorrectly, in the case
// of adopting an oversize typed array) but we don't GC here anyway. That's
// almost certainly fine. The worst case is if you created a ton of fast typed
// arrays, and did nothing but caused all of them to slow down and waste memory.
// In that case, your memory footprint will double before the GC realizes what's
// up. But if you do *anything* to trigger a GC watermark check, it will know
// that you *had* done those allocations and it will GC appropriately.
Kind of interesting, really. The method is clearly somewhat sarcastically named but usefully so: they know what they're doing is not optimal but still do it anyway for the reasons outlined. It's not like this is laziness or nefariousness.
That comment is not the reason for the function’s existence, but the details of how the function interacts with GC.
The reason for the function’s existence is that it allows typed arrays to dynamically switch between a fast/compact representation for the case that the JSVM owns the data, and a slightly slower and slightly less compact version for when the JSVM allows native code to share ownership of the data.
This function, slowDownAndWasteMemory, switches to the less efficient version that allows aliasing with native code.
Of course the name is sarcastic. The actual effect of having this is that JSC can handle both the owned case and the aliased case, and get a small will if you’re in the owned case while being able to switch to the aliased case at any time. Since there’s no way to support the aliased case without some slightly higher memory/time usage, we are sort of literally slowing down and wasting memory when we go aliased.
Source: I’m pretty sure I wrote most of this code.
I often wonder how much more productive I could be if I didn't need to split changes like this into 50 small commits, where every intermediate commit is working and has passing tests...
I understand the desire to make larger patches, but how do you effectively manage them in the review process? For super large commits in the past, I’ve had other engineers hop on a screen share to review the diff together and answer questions, but it feels inefficient.
how do you effectively manage them in the review process
The overriding principle is "no surprises at final review time."
For a big impactful change discuss the work in progress beforehand (specifically: architecture stuff, database changes, impact this will have on other code) with them as much as possible before final review time. There's no other sane way to do it.
Make sure you're all in agreement on the approach you're taking or, at least, make sure they're aware of it. Also a good way to (hopefully) make sure you're not going to conflict with somebody else's work!
Dumping a big commit full of surprises on reviewers is AT BEST poor practice and is at worst kind of shady and a sign you don't want to collab, don't respect your coworkers and/or or are hiding something.
I definitely prefer in avoiding surprises. In this case the review was upgrading and migrating from Vue 2 to Vue 3 and, while the rest of the team was in the loop and aware of coming changes, the change set itself was massive. I definitely would do it differently the next time around, and this is an edge case. I will say that that position saw several “larger than any reasonable person should submit” change sets, glad it’s behind me.
Y'all are very thorough. Just make sure the function has tests and won't blow up prod. You needn't waste forever reviewing. It's code. It can be iterated upon.
In this particular case, we're talking some tricky code around array handling in the JS implementation of a major browser. It's pretty much the archetypical location for a VM escape bug. That's most definitely not something you want to be cavalier about.
In general, the large, sprawling diffs we're talking about here, the sort that actually justify commits of this size, are almost always going to also be the sort that justify more closer scrutiny at review time.
No. In any actually complex piece of code even adding significant amounts testing over new code is not going to cover every possible code path.
It's also immensely difficult to write tests to find errors in code you're writing - most patches I see with "extensive tests" test all the correct code paths, and _maybe_ a few error cases. It's a very easy trap to get yourself into.
The purpose of review is not to catch obvious issues - though it hopefully will - but the subtle ones.
I’ve had both thorough and cursory reviews of changes like this.
It makes no difference for finding actual bugs.
Reviews are great for spreading knowledge about the change and making sure the style is dialed in. Reviews aren’t great for ensuring test coverage or finding security bugs.
Test coverage should be ensured by your coverage tools in the CI. (At least basic test coverage.)
You are right that reviews aren't necessarily there to catch bugs. Reviews are there to tell you to write code that's simple and understandable. If your coworkers can't understand what's going on, how could you expect anyone else to do so in a year or so?
Coverage tools only tell you have tests that hit each line, but most serious bugs aren't the "you didn't hit one line", they're the result of specific interactions between multiple paths. e.g. (note this is a super simple and trivial example)
int size = 0;
char *buffer;
if (a) {
buffer = new char[1];
size = 1;
} else if (b) {
buffer = new char[2];
size = 2;
}
...
if (b) {
buffer[1]++;
}
...
Coverage tests will happily show that every line is hit, without tests necessarily hitting the buffer overflow if an and b are both true.
Indeed. I always push my teams into not hunting for spelling mistakes. Instead look for things that will be hard to fix in the future ('did you realise that this bootloader config stops us doing field updates??') or where the overall direction is wrong ('Not sure about os.system('grep') for this XML - why can't we just use the xml.etree here?').
I worked for a while with Henrik Frystyk Nielsen, who worked on the early HTTP spec and httpd. We would give him a hard time for this — among other things.
I know that Phillip Hallam-Baker probably first used that spelling in a document
This is code running on client machines which you have no control over. Once it's out there good luck getting everyone to upgrade to your latest version because "hey, we didn't want this one to behave like that. oops". Especially when you're google and don't want certain freedoms in your software.
You make sure that more than one person is aware of the changes and design, and have all people frequently discuss the progress before the mega patch hits PR.
Then the mega patch PR is " one last look ", and not thousands of changes storming the gates of your sanity.
Yeah its best to go over something like this is a meeting/zoom or a series of meetings. And most likely not every single line will be gone over, at some point the engineer writing this code is likely pretty senior and shouldn't need that many eyes on their boilerplate, and its the nasty stuff that really needs talking about and focusing on.
Code review doesn't magically catch all bugs even with tiny PRs. At best it's a basic sanity check. In fact I'd say it's more of a design sanity check than a bug catching one.
To prevent bugs you need tests, fuzzing, static analysis, static & strong type checking, etc.
We missed an OS update cycle and had to consume all the changes in one giant patch. But upstream had migrated from Python 2 to 3 in-between and rewrote half their build scripts to accommodate. Each of those changes needed to be reviewed manually because we had modified things to support reproducible builds and the resulting merge conflict was monstrous. I contributed maybe 2300 of those before my sanity failed and called for help.
Now we have a whole team to do that job more regularly.
I find this to be more of a reflection of how well aligned you and the code reviewer are. When both of you know the code base inside out and trust each other, it's fine. But sometimes the author is a noob and the code reviewer has to review every little detail, much like an examiner looking at a student's paper. Or sometimes they might have a personal beef.
> I often wonder how much more productive I could be if I didn't need to split changes like this into 50 small commits, where every intermediate commit is working and has passing tests...
This is down to a trade-off between ease of writing and ease of review.
Yes, if you didn't have to make your work presentable, you could bang it out faster. But then no one could read it.
Native code wants to think it’s a ref counted buffer that could be accessed from any thread including ones that don’t participate in GC. So it’s gotta be allocated in the malloc heap and it’s gotta have a ref count.
JS code wants the buffer to be garbage collected. If native code doesn’t ref it (count is zero) and it’s not reachable via JS heap then it’s gotta die. So, it needs a GC cell and associated machinery (JSC object headers, mark bits, possibly other stuff).
So, the reason why the aliased case is not as awesome as the owned case is that you need double the things. You need some malloc memory and you need a GC cell. You need a ref count and you need a GC header and some mark bits.
Having both malloc/RC and cell/GC overhead isn’t like the end of the world or anything, but it means that allocations are slower and waste more memory than if it was either just a buffer owned by native code or just an object in the GC heap.
With a project that could live on for many years, I’d object to this code, but only because it doesn’t really explain _why_ it’s necessary despite wasting time/memory. The humorous name is great, though. It might be explained in the VCS history, but people aren’t likely to look it up there and I’ve noticed the history tends to get lost for a lot of old projects.
The name isn’t just funny. It’s also accurate. And it’s quite evidently stood the test of time.
In a VM, often the most important kind of documentation is about labeling what is the fast path and what is the slow path. This method name alerts the code reader that they’re looking at one of the slow paths.
Isn't part of the goal with deliberately weird names like this is to alert callers that HERE BE DRAGONS so, like, don't call this unless you've read _and understood_ the source code?
I can't find the reference right now, but IIRC the Glasgow Haskell Compiler had a function called "unsafeDirectIo" which people were using too flippantly. So they renamed it to "evilUnholyDirectIo"
Big fan of these “sarcastically named” functions, there are a lot of good reasons to use them (mainly, every time you go to use it you’ll think about whether it’s really needed) and not many reasons to not use it (naming it “fastConvert” doesn’t make it any faster than naming it “slowConvert”).
Some of our internal tools have command line options along the lines of:
--yolo Yes, I really want to overwrite the production database and get us all fired.
where "--yolo" is our standard flag for such things. When I find myself about to type that, I go back and double, triple, quadruple check that I really intend to do the thing that I'm asking it to do.
Love it! Ceph has --yes-i-really-mean-it for dangerous fs changes, and --yes-i-really-really-mean-it for pool rm. The annoying-to-type property is pretty effective.
This kind of thing is not so unpopular in some kinds of user interfaces. I think GitHub has a repo deletion confirmation that asks you to type the name of the repo. It's not quite the same as flag naming, but it's a big flashing light signalling to you that you're doing something important.
I really like that UX! Even Google dropped the ball there, I accidentally deleted our production gke cluster a handful of years back, through the bloodshot eyes of a forced all-nighter, because their deletion UX was two button clicks (may be different now).
At my company I have repositories that are meant to be open source and repositories that are proprietary. We have an internal tool that ask you to type a sentence the first time you push to open source for confirmation.
But to make sure you actually type it in instead. The sentence you’re instructed to type in is lowercase. But the instructions tell you to type it in as uppercase.
MySQL of course has (had? Not used it for an eternity) the --i-am-a-dummy flag, which felt like it should be reversed (it prevents update/delete without an explicit where clause, but would be more appropriate if you had to use it to allow them).
Fast doesn't always mean good. For example in cryptography a fonction named fastCompare could be a non constant time comparison which is faster but not safe for cryptography :D (though you'd probably want to name it unsafeButFastCompare I guess ^^)
Or it only accepts certain inputs, or has an expensive first invocation but amortized cost, or is not bug-compatible with the old `convert`, or it requires a large cache, or it's optimized for parallelism, or uses instructions only available in newer CPUs, or it was a prototype that stuck around, or...
To try to prevent others from joining this argument: twitter is currently not allowing you to read tweets without being logged in. Hence, it can be both available and not available.
If it cannot be read without logging in, then it is inaccessible. That's a choice twitter gets to make, but to claim tweets are accessible because some people can read them is no different from posting links to a private facebook group and saying "you just need to be a member of that group".
That said, I was saying it is inaccessible because I am getting error messages trying to load it.
That's far from true and simply indicates the direction (and intensity) of your own political passions. This is one of the most reliable phenomena on HN: everyone thinks their team is the one being repressed. People with opposite passions think the site is overrun by your team and that the mods are in your pocket.
Yeah, but based on his musk comment I'm going to assume anything we say isn't relevant.
He's accusing people of lying because they hate musk, when they're saying content is not accessible on a site that is currently documented to be suffering basic response problems. It's the kind of reactive BS you get from idiots who aren't interested in facts or reality so shrug.
One of those irrational times of my year is when I inevitably hit something that really deserves a good descriptor but there just isn’t a good word or two for it. So I end up with one of these monsters.
It’s not a real problem, but it’s a moment that doesn’t bring me joy.
I personally love these kinds of names. On a long enough timeline, using them either:
- prompts someone to tell you the actual name for the thing, and now you know a name for something
- calls out poorly considered dependencies/design
- leads to naming a thing that deserves a name
- calls out complexity which is inherent either to the domain or other aspects of the underlying architecture
Every one of those is a positive outcome, and a less verbose name will never* be a better tradeoff.
* There are many apparent real-world exceptions, but they always fall under the “naming a thing that deserves a name” category, even if the resulting name is less clear than the verbose one. And they’re almost always directly involved in core abstractions (i, acc, cons) or core domain concepts.
Edited to make this less mistakenly ignorant of lisp.
We used to make fun of Java / Spring for names like this, but it turns out it was just being used for absolutely gigantic systems before other languages.
(Yes I know your great uncle Bobert coded 644 million lines of Cobol for a massive app where all function names were less than 4 characters. I'm talking general trends here)
I like https://surf.suckless.org (which uses WebKit) but it seems like every time I use it and leave a window open for awhile, that process (WebkitWebProcess) will end up pegging a core for no obvious reason. It's not site dependent, and I think it's when JS is enabled.
Oh I agree. I suppose I should read https://www.webkit.org/debugging-webkit/.
It's a problem I cant trivially reproduce on command, so (reading your comment above) I wonder how you would approach it.
I would just run "sudo perf record" to do a systemwide performance trace, followed by "sudo perf report" and you'll find the function name where it's spending all the CPU cycles.
Another option is to just "sudo gdb -p 12345" to attach to the process, then "thread apply all bt" to get stack traces of each thread, and find whichever isn't sat in "wait". That has the benefit you can inspect local variables in every frame etc (for opensource stuff, symbols are usually available automatically via debuginfod).
Chromium/blink has much better built in performance tracing things if you can reproduce there - see chrome://tracing/ for tracing internal browser components, or the Performance tab in devtools if you think it is javascript gobbling all the CPU time.
Some of the identifier names are kind of casual here. I was amused by the class called DeferGCForAWhile. I'm guessing just from reading that it's a typical RAII pattern so the "while" that will be deferred may be the scope of the object, i.e. until its destructor is called.
// Consider the following objects and prototype chains:
// O (of global G1) -> A (of global G1)
// B (of global G2) where G2 has a bad time
//
// If we set B as the prototype of A, G1 will need to have a bad time.
// See comments in Structure::mayInterceptIndexedAccesses() for why.
//
// Now, consider the following objects and prototype chains:
// O1 (of global G1) -> A1 (of global G1) -> B1 (of global G2)
// O2 (of global G2) -> A2 (of global G2)
// B2 (of global G3) where G3 has a bad time.
//
// G1 and G2 does not have a bad time, but G3 already has a bad time.
// If we set B2 as the prototype of A2, then G2 needs to have a bad time.
// Note that by induction, G1 also now needs to have a bad time because of
// O1 -> A1 -> B1.
//
// We describe this as global G1 being affected by global G2, and G2 by G3.
// Similarly, we say that G1 is dependent on G2, and G2 on G3.
// Hence, when G3 has a bad time, we need to ensure that all globals that
// are transitively dependent on it also have a bad time (G2 and G1 in this
// example).
//
// Apart from clearing the VM structure cache above, there are 2 more things
// that we have to do when globals have a bad time:
// 1. For each affected global:
// a. Fire its HaveABadTime watchpoint.
// b. Convert all of its array structures to SlowPutArrayStorage.
// 2. Make sure that all affected objects switch to the slow kind of
// indexed storage. An object is considered to be affected if it has
// indexed storage and has a prototype object which may have indexed
// accessors. If the prototype object belongs to a global having a bad
// time, then the prototype object is considered to possibly have indexed
// accessors. See comments in Structure::mayInterceptIndexedAccesses()
// for details.
//
// Note: step 1 must be completed before step 2 because step 2 relies on
// the HaveABadTime watchpoint having already been fired on all affected
// globals.
//
// In the common case, only this global will start having a bad time here,
// and no other globals are affected by it. So, we first proceed on this assumption
// with a simpler ObjectsWithBrokenIndexingFinder scan to find heap objects
// affected by this global that need to be converted to SlowPutArrayStorage.
// We'll also have the finder check for the presence of other global objects
// depending on this one.
//
// If we do discover other globals depending on this one, we'll abort this
// first ObjectsWithBrokenIndexingFinder scan because it will be insufficient
// to find all affected objects that need to be converted to SlowPutArrayStorage.
// It also does not make dependent globals have a bad time. Instead, we'll
// take a more comprehensive approach of first creating a dependency graph
// between globals, and then using that graph to determine all affected
// globals and objects. With that, we can make all affected globals have a
// bad time, and convert all affected objects to SlowPutArrayStorage.
Love that printInternal method. If the caller wants to map the enum to the string, make them know about a PrintStream that they have to instantiate.
(And possibly write their own derivation, if there isn't one that captures output into a string.)
I think pedantically anchors are the markers in the document, and the URL fragments refer to the anchors to tell the browser to scroll there. If I'm wrong, someone will be along to correct me...
Hi dang. Sorry to ask you here but I wasn't sure how to get in touch with the staff. Have you had the chance to review the email I sent to hn@ycombinator.com? I know you're very busy so I don't mean to rush you. I just want to make sure the spam filter didn't get it first hahaha