Most applications read JSONs from networks, where you have a stream. Buffering and fiddling with the whole request in memory increases latency by a lot, even if your JSON is smallish.
Because you work at or for some bureaucratic MegaCorp, that does weird things with no real logic behind it other than clueless Dilbert managers making decisions based on LinkedIn blogs. Alternatively desperate IT consultants trying to get something to work with too low of a budget and/or no access to do things the right way.
Be glad you have JSON to parse, and not EDI, some custom deliminated data format (with no or old documentation) - or shudders you work in the airline industry with SABRE.
If you're building a library you either need to explicitly call out your limits or do streaming.
I've pumped gigs of jaon data, so a streaming parser is appreciated. Plus streaming shows the author is better at engineering and is aware of the various use cases.
Here people confidently keep repeating "streaming JSON". What do you mean by that? I'm genuinely curios.
Do you mean XML SAX-like interface? If so, how do you deal with repeated keys in "hash tables"? Do you first translate JSON into intermediate objects (i.e. arrays, hash-tables) and then transform them into application-specific structures, or do you try to skip the intermediate step?
I mean, streaming tokens is kind of worthless on its own. If you are going for SAX-like interface, you want to be able to go all the way with streaming (i.e. in no layer of the code that reads JSON you don't "accumulate" data (esp. not possibly indefinitely) until it can be sent to the layer above that).
This works fine under a SAX like interface in a streaming JSON parser - your 'event handler' code will execute for a given key, and a 2nd time for the duplicate.
This is a very strange way of using the word "fine"... What if the value that lives in the key triggers some functionality in the application that should never happen due to the semantics you just botched by executing it?
So, you are saying that it's "fine" for an application to execute the first followed by second, even though the semantics of the above are that only the second one is the one that should have an effect?
Sorry, I have to disagree with your "works fine" assessment.
you're layering the application semantics into the transport format.
It's fine, in the sense that a JSON with duplicate keys is already invalid - but the parser might handle it, and i suggested a way (just from reading the stackoverflow answer).
It's the same "fine" that you get from undefined C compiler behaviour.
Why do you keep inventing stuff... No, JSON with duplicate keys is not invalid. The whole point of streaming is to be able to process data before it completely arrived. What "layering semantics" are you talking about?
This has no similarity with undefined behavior. This is documented and defined.
A JSON object with duplicate keys is explicitly defined by the spec as undefined behavior, and is left up to the individual implementation to decide what to do. It's neither valid nor invalid.
If you can live with "fits on disk" mmap() is a viable option? Unless you truly need streaming (early handling of early data, like a stream of transactions/operations from a single JSON file?)