Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Nice post, Dmitry!

Two suggestions: it’s not immediately obvious whether subsequent benchmark result tables/rows show deltas from the original approach or from the preceding one (it’s the latter, which is more impressive). Maybe call that out the first couple of times?

Second, the “using the single Vec but mutating it” option would presumably benefit from a reserve() or with_capacity() call. Since in that approach you push to the vector in both erroring and non-erroring branches, it doesn’t have to be exact (though you could do a bfs search to find maximum depth, that doesn’t strike me as a great idea) and could be up to some constant value since a single block memory allocation is cheap in this context. (Additionally, the schema you are validating against defines a minimum depth whereas the default vec has a capacity of zero, so you’re guaranteed that it’s a bad choice unless you’re validating an empty, invalid object.)

But I agree with the sibling comment from @duped that actually parsing to JSON is the biggest bottleneck and simply parsing to the minimum requirements for validation would be far cheaper, although it depends on if you’ll be parsing immediately after in case it isn’t invalid (make the common case fast, assuming the common case here is absence of errors rather than presence of them) or if you really do just want to validate the schema (which isn’t that rare of a requirement, in and of itself).

(Edit: I do wonder if you can still use serde and serde_json but use the deserialize module’s `Visitor` trait/impl to “deserialize” to an enum { Success, ValidationError(..) }` so you don’t have to write your own parser, get to use the already crazy-optimized serde code, and still avoid actually fully parsing the JSON in order to merely validate it.)

If this were in the real world, I would use a custom slab allocator, possibly from storage on the stack rather than the heap, to back the Vec (and go with the last design with no linked lists whatsoever). But a compromise would be to give something like the mimalloc crate a try!



Thanks!

> Two suggestions: it’s not immediately obvious whether subsequent benchmark result tables/rows show deltas from the original approach or from the preceding one (it’s the latter, which is more impressive). Maybe call that out the first couple of times?

Agree!

> Second, the “using the single Vec but mutating it” option would presumably benefit from a reserve() or with_capacity() call. Since in that approach you push to the vector in both erroring and non-erroring branches, it doesn’t have to be exact (though you could do a bfs search to find maximum depth, that doesn’t strike me as a great idea) and could be up to some constant value since a single block memory allocation is cheap in this context. (Additionally, the schema you are validating against defines a minimum depth whereas the default vec has a capacity of zero, so you’re guaranteed that it’s a bad choice unless you’re validating an empty, invalid object.)

Oh, this is a cool observation! Indeed it feels like `with_capacity` would help here

> But I agree with the sibling comment from @duped that actually parsing to JSON is the biggest bottleneck and simply parsing to the minimum requirements for validation would be far cheaper, although it depends on if you’ll be parsing immediately after in case it isn’t invalid (make the common case fast, assuming the common case here is absence of errors rather than presence of them) or if you really do just want to validate the schema (which isn’t that rare of a requirement, in and of itself).

My initial assumption was that usually the input is already parsed. E.g. validating incoming data inside an API endpoint which is then passed somewhere else in the same representation. But I think that is a fair use case too and I was actually thinking of implementing it at some point via a generic `Json` trait which does not imply certain representation.

> (Edit: I do wonder if you can still use serde and serde_json but use the deserialize module’s `Visitor` trait/impl to “deserialize” to an enum { Success, ValidationError(..) }` so you don’t have to write your own parser, get to use the already crazy-optimized serde code, and still avoid actually fully parsing the JSON in order to merely validate it.)

Now when I read the details, it feels like a really really cool idea!

> If this were in the real world, I would use a custom slab allocator, possibly from storage on the stack rather than the heap, to back the Vec (and go with the last design with no linked lists whatsoever). But a compromise would be to give something like the mimalloc crate a try!

Nice! In the original `jsonschema` implementation the `validate` function returns `Result<(), ErrorIter>` which makes it more complex to apply that approach, but I think it still should be possible.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: