Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

OK. I totally prefer ordered JSON, because it is so much easier to eyeball - to visually compare JSON with different ordered keys is quite a lot more difficult (O() complexity?) than if they are in the same order. It also enables diff to help see where the differences are (diagnosis, not just binary identical or not).

And, in fact, I do use ordered JSON for comparison in testing, as you describe.

However... comparison of JSON as objects (i.e. in memory) is order independent. Hashcode is also order independent (the trick is to sum the elements' hashcodes e.g. https://docs.oracle.com/javase/7/docs/api/java/util/Set.html...

Diff for ordered JSON is possible using longest common subsequence for trees, but has terrible complexity, and lacks diff's clever optimizations both general and specific to typical input.



+1 for all reasons above for ordered JSON, highly convenient in practice.

And if it doesn't impact performance significantly, these are all pretty good reasons for JSON outputters to default to sorting objects deterministically by keys, or at least to provide a flag to do so. (Even if there's no canonical sort order between JSON libraries, all that matters is it's deterministic for each library.)

BUT... I can't imagine any scenario where you'd want to validate that JSON content was ordered on the input, which is what was enabled in this article. Why does IBM even have that as an option?!

Be strict in what you emit and liberal in what you accept, and all that...


I guess because OrderedJSONObject is ordered, not sorted. Like java's LinkedHashSet, it maintains the order keys are added.

If you're going to rely on a specific order for comparisons, it makes sense to alert the user to any JSON in a different order (instead of silently, liberally accepting it), or you'll get false negatives elsewhere. Easier to check for a sorted order, but also possible to define a specific order. IDK what IBM did here.

funfact: jq used to sort keys; now it retains ordering.


Indeed, and remember this JSON message is going to a mainframe. Mainframes don't have much memory and typically process record-by-record, or event-by-event. So the implemenations probably streams the JSON in and constructs COPYBOOK from the payload before continuing to invoke the cobol.

So the rework time might be to write a general purpose re-order layer that can re-order any imcoming message.


> any scenario where you'd want to validate...

Because it's a precondition for something else down the line (a dependency)


Please XOR your hashes instead of adding them! If you add them, you're losing bits on the low end. EDIT: No you're not. It feels like you should be, but with unsigned overflow, this actually works just fine.

This assumes of course that you're using proper hashes that make use of the full domain of the output type (a proper hash will have a 50% chance of any arbitrary bit being flipped by any change to the input). But if you're not using proper hashes, you're doing something wrong.


Can you please explain this assertion? ;)

If I have a 32 bit current hash value-- for any possible 32 bit value I add, I get a different 32 bit value out.

XORing is effectively adding each bit and throwing away the carry bit. Adding just cascades carries to the left.


You know what, you're right. I made a knee-jerk comment but I didn't think it through all the way. From any arbitrary unsigned 32-bit integer, every other unsigned 32-bit integer is reachable with a single addition. Therefore addition works just fine here.

It still feels wrong to say this, it feels like since adding will effectively shove bits off the high end and drop them on the floor that you're losing information, but I can't actually justify that feeling with reasoning.


Adding is actually considerably better. For high quality hashes XOR is just as good; but if there's any distributional problems at all in the hash, adding mixes stuff more.

(XORing is effectively adding with all of the carry information lost/falling off).


Please note that using a sum of hashes almost certainly weakens any cryptographic guarantees you may expect from your hashes. Of course, that may be fine depending on your use case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: