Lots of comments here about XML vs. JSON... but there are areas where these two don't collide. I'm thinking about text/document encoding (real annotated text, things like books, etc).
Even though XML is still king here (see TEI and other norms), some of its limitations are a problem. Consider the following text:
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Now say you want to qualify a part of it:
Lorem ipsum <sometag>dolor sit amet</sometag>, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Now say you want to qualify another part, but it's overlapping with previous part:
Lorem ipsum <sometag>dolor sit <someothertag>amet</sometag>, consectetur</someothertag> adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Of course, this is illegal XML... so we have to do dirty hacks like this:
Lorem ipsum <start someid="part1"/>dolor sit <start someid="part2"/>amet<end someid="part1"/>, consectetur<end someid="part2"> adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
Which means rather inefficient queries afterwards :-/
A strategy I've seen for dealing the inability of XML to handle overlapping tags, is to treat the tagging as an annotation layer on top of the node with the data:
<doc>
<data type="text">
This is some sample text.
</data>
<annotations>
<tag1 start="1" end="3" comment="foo"/>
<tag2 start="2" end="4" type="bar" />
</annotations>
</doc>
The start and end are usually byte offsets from the start of the text content in the data node. It still sucks, but at least you could apply the same general stragegy to more than just text data - I've seen it used with audio/video where the offsets are treated as time offsets into the media.
In my experience, the traditional solution for editing with these kinds of hacks is to write a buggy piece of shit custom GUI so people can edit documents. That way, the complaints shift away from your lousy data format to your lousy UI. Problem solved!
I would argue that the inline way of annotating things in XML is actually ok-ish if one absolutely needs human edit-ability, but otherwise bad design.
{text: "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.",
annotations: [{tag: "sometag", ranges: [{from: 12, to: 26}]},
{tag: "sometothertag", ranges: [{from: 21, to: 39}]}
Note that this also removes the limitation that annotations have to be consecutive.
crafty, but for your consideration: that places the burden upon every library author to be "accounting accurate" to any edits, and the only way anyone would know that it's not correct is to visually inspect the output text
also, as I get older I have a deeper and deeper appreciation that "offset" and "text" are words that are fraught with peril
What about using inline "floating" checkpoints for the ranges, instead of character indexes?
{text: "Lorem ipsum {{1}}dolor sit {{2}}amet{{3}}, consectetur{{4}} adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.",
annotations: [{tag: "sometag", ranges: [{from: 1, to: 3}]},
{tag: "sometothertag", ranges: [{from: 2, to: 4}]}
Fair critique, but shouldn't strings be immutable anyways? Once you bring editing into play, you'll probably either want something like a rope or some CRDT and then you have more effective means of tracking positions, than manual offset computations, as part of the data-structure.
You are absolutely right that XML is better for document structures.
My current theory is that Yjs [0] is the new JSON+XML. It gives you both JSON and XML types in one nested structure, all with conflict free merging via incremental updates.
Also, you note the issue with XML and overlapping inline markup. Yjs has an answer for that with its text type, you can apply attributes (for styling or anything else) via arbatary ranges. They can overlap.
Obviously I'm being a little hypabolic suggesting it will replace JSON, the beauty of JSON is is simplicity, but for many systems building on Yjs or similar CRDT based serialisation systems is the future.
Yjs isn’t a document structure is it? It seems to be a library for collaborative editing, but I’m not seeing something suitable for marking up a document, or am I missing something obvious?
This is actually one of the things processing instructions are useful for - but you would need to define the data within the PI, since they don't have attributes.
Nobody deserves XML! In all seriousness I get the idea behind XML and I have used a couple of SOAP services which were absolutely brilliant, but as someone who has spent a decade “linking” data from various sources in non-tech enterprise… Well… let’s just say that I’m being kind if I say that 5% of the services which used XML were doing it in a way that was nice to work with.
Which is why JSON’s simplicity is such a win for our industry. Because it’s always easy to handle. Sure you can build it pretty terrible, but you’re not going to do this: <y value=x> and then later do <y>x</y> which I’m not convinced you didn’t do in XML because you’re chaotic evil. And you’re not going to run into an issue where some Adobe Lifecycle schema doesn’t work with .Net because of reasons I never really found out because why wouldn’t an XML schema work in .Net? Anyway, I’m sure the intentions behind XML were quite brilliant but in my anecdotal experience it just doesn’t work for the Thursday afternoon programmer mindset and JSON does.
>and I have used a couple of SOAP services which were absolutely brilliant
makes me doubt the first part of the statement.
If I were to guess what it means is you understand the point of SOAP, and also understand the limitations and problems especially as it relates to uses of XSD in SOAP and the various stack of Web Service specs, but you probably have not had much experience with non-XSD based validation of XML, you do not have any experience with document formats as opposed to data formats, you probably are not familiar with larger international standards like UBL, and not familiar with XML formats that are not so much data or document oriented - SVG, XSL-FO (which admittedly sucks more than is reasonable), GraphML and so forth...
A lot of the commenters here are standing up for the value of XML, and I'm not actually with this comment, there are a lot of benefits for using JSON especially when you are using JavaScript all over the place. But saying XML sucks because XSD and SOAP sucks indicates a potential lack of knowledge about the whole subject (perhaps only caused by infelicitous phrasing)
I’m simply trying to say that I think XML sucks because the people who implement it suck, and not because of any technical reasons. Hell, I’m saying that I think XML sucks because it allows people to suck when they use it, myself included.
A catchy but a meaningless phrase. JSON is a dumpster on fire. Probably in even more ways than XML is. Maybe you deserve it... I feel like I'm being punished by the stupid people who make me use it in a way similar to the sham court hearings from The Planet of Apes.
There are multiple contradictory requirements to different things you could want from communication formats. Below are some examples:
* You could want to have a universal tool that can examine and understand the contents of the message (for debugging purposes), but you could also want not to send meta-information about the message (s.a. types or sizes etc.) that is essential for parsing the message. And you cannot have both at the same time.
* You could want a message format that maps onto the primitive types of a particular language well, but at the same time you may want it to be universal and map to many other languages well. But this is impossible because different languages will have different primitive types and the need to be generic will act against the need of being specific.
* You may want to be able to stream data, but this works against hierarchic data organization.
* You may want to be able to write messages into pre-allocated memory buffers w/o having to re-calculate the amount of memory necessary to encode a message, but this makes it very hard / impossible to add custom fields and types.
---
Given all this, I don't think that JSON is a good match for any use-case it's currently commonly used for. If I want data transfer, I'd go for something like SQL. If I want configuration, I'd go for Datalog. But then I see value in optimizing the transfer of multiple similar records, whereas someone else may see value in optimizing transfer of hierarchically structured data, which isn't necessary repeating. I tried many formats of this kind, and am yet to find a good one. I'm inclined to think that maybe trying to arrange hierarchical data with different constraints on its organization is just a bad approach to data transfer, that the organization and constraints of such data shouldn't be encoded as part of the format, but interpreted by the users of the format. But, if I really had to do this the best way I can image, I'd still go for Datalog.
Whenever two or more are gathered together, they shall argue about JSON vs XML.
Personally I like the simplicity of JSON and also the expressive power of XML. But then I tend to only use each for the task it was primarily intended: application data-on-the-wire in JSON and "documents" in XML. It seems like a lot of the recurrent discussion around these technologies happens when they're pushed to do things outside their comfort zone. And I wonder if some of this is down to siloing of developer knowledge.
There was a comment on HN a few days ago (not by me, and I can't find it now) to the effect that web development has historically attracted self-taught developers or those who have come to it by routes like bootcamps. It went on to say that they perhaps consequently lack some knowledge of existing techniques and solutions, and therefore tend to recreate solutions that may already exist (and not always well). And this drives the well-known churn in webdev tech: of which bolting schemas onto JSON is arguably an example.
I wonder what people think of this? Personally I think it has some merit, but that the "churn" has also generated (along with much wheel-reinvention) some great innovations. And I say that as someone who works mainly on back-end stuff.
I'd extend this "X developers are mostly self-taught" onto all of computer development. They say, "Every developer Of a Certain Age's first programming language was BASIC" and my experience of (eventually) getting a CS degree is that there is the expectation of students to already know how to do the thing that they are trying to teach; a certain level of "self taught" is expected.
To that end, I can see how in The Age of Teh Internets that the standard of self taught has moved of from BASIC to HTML/CSS/JS (or Unity or whatever sparked the young mind's attention).
---
What I'm not certain of is that "self taught" means that work will be duplicated because the self taught developer doesn't know the technology that exists. I think that someone who is extremely online will very likely be more abreast of what technologies exist. I think that a formal education is better at establishing what the fundamentals underlying a programing method or paradigm... but not necessarily at exposing new programmers to what the state-of-the-art is.
The primary reason why JSON does not support comments is that its creator, Douglas Crockford, deliberately removed them from the format to prevent misuse and keep it as a pure data-only format.
Crockford observed that some people were using comments to store parsing directives, which could break compatibility between different systems. Hence, the decision to remove comments to maintain the simplicity and consistency of the format across various programming languages and environments.
Except for many yaml implementations either not supporting 1.2 at all (pyyaml, ruby stdlib) or being a weird mix (goyaml) so as to keep working with older files.
So when you’re dealing with objective reality, this is still an issue today.
JSON is (arguably) too strict. YAML is (arguably) too loose. One is better for machines, the other is (usually) better for writing by humans by hand. There's no perfect compromise for every use case.
It is interesting that people love json (now with schema), but hate XML while loving HTML at the same time. It is all pretty boring and largely the same imo.
The absolute worst bit of XML is the confused implementations. What should be an attribute on a tag, and what should go between tags? Even worse, nothing is sanely typed without an xsd. Different systems will treat the following differently:
<some>true</some>
versus
<some>1</some>
Some systems require the token "true", others will only treat 1 as the boolean true.
For example, MS claims that for exchange ASD boolean values must be integer 1 or 0 [0], but then links to a W3C spec that allows for the tokens true and false [1]
At least with JSON and HTML, you don't need a separate definition file for basic, primitive data types.
XML is only concerned with whether a document is well-formed, not its conformity to a given schema. Schemas like XSD, DTD, etc can be plugged in later. Many systems just have an ad hoc schema.
> At least with JSON and HTML, you don't need a separate definition file for basic, primitive data types.
Unless I’m missing your meaning, this seems like an apples-to-oranges comparison. HTML is not a general-purpose format like JSON. It’s a very complicated document format that is validated with reference to an external spec.
I think XML is a great fit for a document format that can become arbitrarily complex yet still easy to author and validate. It’s obviously a really poor fit for a wire transport protocol.
I don't see XML and json as different at all - they are both markup languages that describe trees. As long as what you are describing is a tree, either is fine. Of course, I would never do an XML endpoint on a rest http service since no one would expect that and they would assume that you don't know what you are doing and your service sucks.
Well, it's the same in XML, more or less. [1] The difference is that XML cleanly separates types and data, all the type information is in the schema and there is no type information in the data, so without a schema you can not properly type the data.
JSON on the other hand does not separate types and data, the types are implicitly contained in the data. So you can get type information from the data without a schema, at least up to the point where JSON's simple type system is no longer expressive enough, then you need - just as with XML - a schema to get the correct type information, for example to distinguish actual strings from dates.
If you really need this, nobody stops you from including type information in XMLs, <have type="boolean">true</have> attributes on elements or <quote>"strings"</quote> but not numbers <numbers>123</numbers> and use that. You will of course have to do this on your own, that is just not the way XML is supposed to be used.
[1] Let me clarify this a bit. If you handle XML, you usually have a schema and therefore the type information. If you have a non-trivial JSON, you also need a schema for the types. You can only get away without a schema for simple JSONs where the implicit type information is good enough. But then you could do almost the same with XML, just parse the content and see like what type it looks. You will not get quite to what JSON can do in a sane way but it might be good enough just as JSON without a schema is sometimes good enough.
> a schema to get the correct type information, for example to distinguish actual strings from dates.
Which, in practice, is a terrible oversight. I've honestly never seen a JSON store/transport/serde in practice without dates and/or times in them. There's always some updated_at or captured_on somewhere in the API or dataset.
Of all the data-types needed, I'd say dates are amongst the most important. At least more important in practice than floats; which JSON does support for odd reasons. Especially with dates being ambigous at best and inconsistent at worst.
Pubquiz: when was/or will be, how much paid? { currency: "THB", paid_at: "04-03-2566", amount: 13.37 }¹ - JSON is neat for simple use-cases, but utterly impractical for when precision and correctness is required. Yet here we are, building around and on top of it, to get that correctness and precision.
¹I'm messing a bit, 'cause this calendar isn't used in practice for such use-cases anymore. Hardly. But I've seen this with Hijri calendars. And those silly US date-formats. I've seen it "solved" with complex structs like { created_at: { year: 2022, day: ... , timezone:xxx}". I've myself "fixed" floating-point precision issues in financial applications that used JSON by using all-strings: "{ currency: "USD", amount_in_cents: "1337" } and such.
> You're complaining about people encoding dates in string and at the same time you encode numbers in string. That's funny.
No. I'm complaining that JSON is too limited. And that it's "type system" is lacking so much that I have to resort to hacks like encoding numbers in strings. Which I think is embarrassing for an industry.
> There's standard to encode dates to string. It's called ISO-8601 and it's supported everywhere.
It's not. Too many servers and services use formats other than ISO-8601. Should I call Visa that their export formats suck? Or that Random API that their JSON datefields should be changed to ISO-8601.
It's supported in most languages. But e.g. something widely used as Google Sheets doesn't support this: If you get a JSON or CSV with ISO-8601 into Google sheets, a lot of string parsing and even regexes are needed to turn it into a proper date.
Saying "we use ISO-8601 and that solved everything" only works if you never need any service outside of yours and never interop or exchange data with other services. Which in practice is never for anything remotely successful.
> Which in practice is never for anything remotely successful.
I have yet to see Time be easy. Anywhere. At all. From daylight savings being state-dependent, system times resetting to rand, right down to CPU monotonic timing.
Using Time handling as a criticism to JSON's architecture doesn't hold water.
This would've been useful if you knew what kind of number it was...
As for what goes into separate elements and what goes into attributes: a typical answer to this is that simple types (as per XSL) go into attributes, complex types go into elements.
Compare this to JSON's screwed-up definition of "hash-tables" (the things in curly braces) which doesn't require that "keys" be unique.
XML wasn't perfect. But JSON isn't really better. It sucks in a slightly different way because people keep inventing these formats w/o much thinking, and once discover problems, don't fix them.
This would've been useful if you knew what kind of number it was...
JSON isn’t ambiguous when it comes to this. Numbers are arbitrary precision decimal numbers[1].
I’m guessing your issue is with how Javascript interprets JSON numbers as 32 bit floats. But that is a (mis-)feature of JavaScript and switching your serialization format to XML would not help, because JavaScript represents all numbers as 32-bit floats.
fun fact: that's JavaScript. JavaScript only supports double-precision 64-bit binary format IEEE 754.
But JSON doesn't disallow arbitrary precision numbers, that's up to the parser implementation.
number
integer fraction exponent
In fact not all implementations support IEEE 754 doubles, and, from my experience, when dealing with money and rounding errors, many decide to serialize numbers as exact strings and use custom code for deserialization.
> when dealing with money and rounding errors, many decide to serialize numbers as exact strings and use custom code for deserialization
That's exactly what I'm doing. And indeed another reason why I feel embarrassed by JSON. I mean, we -the industry- have been doing financial data transport over computer networks, for how long now? fifty years? And we keep "inventing" transport formats that unsolve issues that have long been solved and done. XML had this solved[1]. Hell, even the ancient MT940[2] had this solved.
> The absolute worst bit of XML is the confused implementations. What should be an attribute on a tag, and what should go between tags?
XML is a language for marking up text. SVG uses attributes for all vector data, because the vector points are not meant to be presented to a user as raw data.
If I embed a SVG into a XHTML document and the browser does not understand SVG, the text within the graphic is still presented to the user.
> Even worse, nothing is sanely typed without an xsd. Different systems will treat the following differently:
This is not a responsibility of XML, which deals in a common well-formed markup format for various document format.
It sounds like you are dealing with a tool that has defined an XML-based data interchange format, and that they may have inconsistent tooling for their format.
I'm not certain I understand the point you're trying to make regarding `undefined`, or indeed the expectation that `jq` and `JSON.stringify()` follow the same rules.
`JSON.stringify()` is documented to behave exactly as you demonstrate, so there's no surprises:
undefined, Function, and Symbol values are not valid JSON values. If any such values are encountered during conversion, they are either omitted (when found in an object) or changed to null (when found in an array). JSON.stringify() can return undefined when passing in "pure" values like JSON.stringify(() => {}) or JSON.stringify(undefined).
Expecting `jq` to somehow understand that its input came from Javascript's `JSON.stringify()` and so should be parsed on that basis seems ... odd? I don't see any problem with what `jq` is doing there, but anyway I don't see a problem with JSON itself in these examples.
> undefined, Function, and Symbol values are not valid JSON values
that's the point.
The official JSON serializer from every broswer vendor and every Node installation produce invalid JSON.
Which for the JavaScriptObjectNotation is kinda hilarious.
> Expecting `jq` to somehow understand that its input came from Javascript's `JSON.stringify()`
I would expect `JSON.stringify` to give an error if trying to serialize something that naturally does not map to JSON, like many other libraries do.
You have to provide a manual override for those situations.
But JavaScript and ECMA (`JSON.stringify` is defined in the standard) decided that no, they can ignore the specs for some reason.
Problem is they can't fix it now, because too many applications rely on those wrong assumptions.
Here it is the reason why you can find <flag>true</flag> and <flag>1</flag>
Difference being XML was born to standardize the document format, JSON aspired to be a data format but failed miserably at it, even at the most basic level, like saying an int from a float. The spec is simply too vague and ambiguous to give some guarantee of interoperability, beyond numbers and strings.
> The official JSON serializer from every broswer vendor and every Node installation produce invalid JSON.
I don't think I agree with this. `JSON.stringify` isn't producing JSON when it returns `undefined`. Instead...
> I would expect `JSON.stringify` to give an error if trying to serialize something that naturally does not map to JSON, like many other libraries do.
> You have to provide a manual override for those situations.
... `undefined` is an error. As in, there's no meaningful difference between catching an exception and providing "a manual override" for `undefined`, is there?
> But JavaScript and ECMA (`JSON.stringify` is defined in the standard) decided that no, they can ignore the specs for some reason.
What part of what spec is being ignored? `JSON.stringify` conforms to its own spec, as you say; and when it returns JSON, the JSON is valid. Meanwhile the JSON spec itself is very explicit about not declaring rules for serialisation/deserialisation:
The goal of this specification is only to define the syntax of valid JSON texts. Its intent is not to provide any semantics or interpretation of text conforming to that syntax. It also intentionally does not define how a valid JSON text might be internalized into the data structures of a programming language. There are many possible semantics that could be applied to the JSON syntax and many ways that a JSON text can be processed or mapped by a programming language. Meaningful interchange of information using JSON requires agreement among the involved parties on the specific semantics to be applied. Defining specific semantic interpretations of JSON is potentially a topic for other specifications.
>> What should be an attribute on a tag, and what should go between tags?
Are you ok with <a href="..">link</a>?
That was kind of my original point, people are fine with html but don't like XML. I think the real reason people don't like XML is it reminds them of Steve Ballmer.
JSON is a much better serialization format since XML was designed as a document format. For example, there is no standardized way to serialize a string with a null character even if you escape it (this is allowed in many programming languages). JSON just says do “\0” and calls it a day. I’m not sure if it’s better for users, but it’s certainly easier to work with as a dev.
HTML isn’t trying to serialize abstract data and is doing what XML does best in being a document/GUI format. It doesn’t matter all that much that it can’t represent null characters in a standard way because it isn’t a printable character.
Try using UTF-8 encoding for XML, and your problems with zero byte encoding will go away.
Your understanding of "easier" is oversimplified to the point that it's wrong. It's easier to do the wrong thing in JSON, it's harder to do the right thing in JSON (compared to XML).
JSON is a poorly thought-out format. It's problems become progressively more difficult to deal with the more you expect of your program.
JSON and XML both support UTF-8. Neither supports embedding arbitrary binary data directly, especially if that data is not valid in your current character set
You can't safely send zero bytes over XML even with UTF-8 encoding. Not in practice:
echo '<zero>�</zero>' | xmllint -
-:1: parser error : xmlParseCharRef: invalid xmlChar value 0
<zero>�</zero>
^
printf '<?xml version="1.0" encoding="utf-8"?><zero>\0</zero>' | xmllint -
-:1: parser error : Premature end of data in tag zero line 1
<?xml version="1.0" encoding="utf-8"?><zero>
^
I explained how to do it, and yet you didn't follow instructions, have done something irrelevant, and are now complaining that it doesn't work... well sucks being you, I guess.
There are many ways in which something can be simple. I believe that the most relevant metric for simplicity of something like JSON isn't the number of language elements it has (this would mean that, eg. Brainfuck is simpler than JavaScript), but the amount of work necessary to produce a correct program. JSON is an endless pit of various degrees of difficulties when it comes to writing real-world programs. It's far from simple in that later case.
I.e. learning about namespaces would take a programmer couple of hours, including a foosball match and a coffee break, but working around JSONs bad decisions when it comes to number serialization or sequence serialization will probably take days in the best case, with a side-effect that this work will most likely have to be done on an existing product after a customer complained about corrupting or losing their data...
>I.e. learning about namespaces would take a programmer couple of hours, including a foosball match and a coffee break
It's not about the time it takes to learn about namespaces. I'm talking about the complexity that namespaces and entities add to the data model and the requirement to actually handle them throughout the entire stack.
You can normalise and compare arbitrary pieces of JSON using only information available locally in that same sequence of UTF-8 bytes. You cannot do that with XML. You have to consider the whole document context and resolve all namespaces and entities before actually comparing anything.
The JSON specification is ~5 pages and most of that is diagrams. The XML specification is ~40 pages long and it imports ~60 pages of URI specification.
I'm not saying that it's impossible to use only the simple parts of XML unless and until you actually need what namespaces have to offer. But that's culture, and you have no control over other people's culture.
> I'm talking about the complexity that namespaces and entities add to the data model
I've worked a lot with XML, and I have no idea what complexity are you talking about. This just wasn't complex / difficult. Once you've learned what this was about, this was your second nature. Eg. I spent a lot of time working with MXML -- that is an XML format for Adobe Flex markup similar to XAML and a bunch of others of the same kind. It used XML namespaces a lot. But that was the least of my problems using it...
Again, I've never had anyone who learned how and why to use XML namespaces complain about it. All complaints about this feature were coming from people discovering it for the first time.
> You can normalise and compare arbitrary pieces of JSON
Dream on. No, you cannot. It depends on parser implementation. For example, you have two 20-digit numbers where 15 most significant digits are the same. Are these numbers the same number or a different number in JSON?
The fact that it's 5 pages means nothing... it's 5 pages that define a bad language that creates a lot of problems when used. So what if it only took 5 pages to write it? You can probably squeeze Brainfuck definition into half a page? -- So what, it's still a lot harder to use than JavaScript.
I worked with XML extensively for many years starting back in the 1990s. When I'm saying that namespaces add complexity to the data model I'm not complaining about them being difficult to use or understand.
>Dream on. No, you cannot. It depends on parser implementation. For example, you have two 20-digit numbers where 15 most significant digits are the same. Are these numbers the same number or a different number in JSON?
That's just a mildly interesting interoperability edge case that can be worked around. I agree that it's not good, but it is a problem on a wholly different level. XML elements not being comparable without non-local information is not an edge case and not an oversight that can be fixed or worked around. It's by design.
I'm not criticising XML for being what it is. XML tries to solve problems that JSON doesn't try to solve. But in order to do that, it had to introduce complexity that many people now reject.
Edit: I think we're talking past each other here. You are rightly criticising the JSON specification for being sloppy and incomplete. I don't dispute that. I'm comparing the models as they are _intended_ to work. And that's where XML is more complex because it tries to do more.
Here's a thing that happened in the wild. Neo4j database encodes ids of stored entities as 128-bit integers and it has a JSON interface. When queried from Python, the Python client interprets digit sequences longer than what could possibly fit into 2^32 as floats (even though the native kind of integer in Python is of arbitrary size).
So, for a while there weren't too many objects, ids appeared to be all different... until they weren't. It's easy to see how this led to data corruption, I suppose?
---
Here's a hypothetical example: few people are aware that JSON allows key duplication in "hash-tables", also, even if they consider such a possibility they might not know that JSON doesn't prescribe which key should win, should there be many of them. They might assume that the definition requires that the first chronologically wins, or last, or... maybe some other rule, but they hope that it's going to be consistent across implementations.
Obviously, to screw with developers, JSON doesn't define this. So, it's possible that two different parsers will parse the same JSON with the same fields differently. Where this could theoretically explode? -- Well, some sort of authentication which sends password with other data that can be added by user, and the user intentionally or accidentally adds a "password" field, which may or may not later be overriden and may or may not later be interpreted on the other end as an actual password.
---
There are many other things, like, for example, JSON has too many of the "false" values. When different languages generate JSON they may interpret things like "missing key" and "key with the value null" as the same thing or as a different thing. Similarly, for some "false" and "null" are the same thing, while for others it's not.
>few people are aware that JSON allows key duplication in "hash-tables"
I would say it's the other way around. Many people seem to think that duplicate keys are allowed in JSON, but the spec says "An object is an unordered set of name/value pairs". Sets, by definition, do not allow duplicates.
>There are many other things, like, for example, JSON has too many of the "false" values. When different languages generate JSON they may interpret things like "missing key" and "key with the value null" as the same thing or as a different thing. Similarly, for some "false" and "null" are the same thing, while for others it's not.
I don't see how this is a JSON issue. There's only one false value in JSON. If some application code or mapping library is hellbent on misinterpreting all sorts of things as false then there is no way to stop that on a data format level.
What I do agree with is your critcism of how the interpretation of long numbers is left unspecified in the JSON spec. This is just sloppy and should be fixed.
if you are making JSON data to use in your language and application you will probably not have any problem. But as in any thing there can interoperability issues between implementations and programming languages - especially if your JSON is being generated and consumed by your JavaScript site
Neither did XML originally. XML schema was sort of bolted on via some conventions of defining a schema in the root element. The XML 1.0 spec doesn't mention those. XML Schema is a separate standard that came later. Likewise namespaces are a separate specification as well and not part of the XML specification.
The XML specification does have Document Type Definitions (DTD), which were sort of inherited from SGML. This is an optional declaration with its own syntax that defines a DTD. I don't think they were that widely used. XMl Schema started out as an attempt to redefine those in XML.
The nice thing with XML Schema was that you could usually ignore them and just use them as documentation of stuff that you might find in a document. Typically, schema urls wouldn't even resolve and throw a 404 instead. More often than not actually. My go-to tool was xpath in those days. Just ignore the schema and cherry pick what comes back using xpath. Usually not that hard.
The culture around Json is that it emerged out of dynamic language communities (Javascript, Ruby, Python, etc.) with a long tradition of not annotating things with types and a natural aversion against using schemas when they are not needed. Also, they had the benefit of hindsight and weren't looking to rebuild the web services specs on top of json but were actively trying to get away from that.
>XML schema was sort of bolted on via some conventions of defining a schema in the root element.
I know, and I'm not talking about XML Schema at all (partly because it hurts my brain to even mention the absolute worst specification ever written).
I mean just the complexity of the XML data model itself, including namespaces, entity references and the ridiculously convoluted URI spec. That's more than enough to make XML far more complex than JSON.
To be fair, XML solves problems that JSON doesn't solve. JSON is not a better XML. JSON's creators simply decided that many of problems that XML solves don't need solving or should not be solved by a data format specification.
I like DTDs. For all their weirdness, they solve a problem rather simply. Reminiscent of BNF. (Altho admittedly they are clueless about namespaces and other fancy bolt-ons.)
Yeah, culture is a big one.
See dotnet vs java. The latter picked up many of c# feature over the years but is still much more verbose, e.g. because their developers still abhor var
It's very easy to understand why people prefer JSON.
95% of developers know exactly what JSON is without ever having read anything technical about it. It's obvious.
XML on the other hand... Who here can say they actually know anything substantial about XML besides the syntax? My guess is <10%.
XML suffers from too many options and useless bells and whistles.
E.g. the attribute vs Parameter topic is a source of confusion, without adding much value, especially if the source and target are object oriented and/ or a relational db. What's the point?
Then there are namespaces, sure there are probably lots of places where you need to use them. But I never encountered a place where they are really needed, but because they are the default you need to work with them or your queries do not work. Super confusing for beginners and annoying as heck.
Why is "how hard it is for beginners to understand a concept without reading a reference" a useful metric for measuring anything? So what if it's hard? -- Spend an hour with the reference document, and your problems will go away.
In the days when XML was popular I've been more active in several Web forums that helped novice users with particular technology (and that included XML). Not a single confusion about XML namespaces came from someone who read the reference. Quoting the reference would be also a very efficient way to clear the confusion.
Bottom line: it's not a problem worth mentioning. In the grand scheme of things an hour you'd have to spend reading the specification is a drop in a bucket compared to all the time you'd have to work with XML. It's a fixed-size effort that you have make once. Compare this to having to deal with bad "number" serialization that you have to deal in JSON every time in a new program that deals with JSON.
> Why is "how hard it is for beginners to understand a concept without reading a reference" a useful metric for measuring anything?
Two reasons:
1) Because it's unnecessary complexity. When you add unnecessary complexity into fundamental technology that everything uses, you've now made everything worse. It's like polluting the lake, and then ignoring the fact that beginners need to learn how to boil the water properly drinking it.
2) Because that prevents the technology from being adopted. Whether you think it's justified or not, beginners will choose the tech that's easier to use, and it will succeed.
The market of technology adoption forces us to make things simple for beginners, and in the end, that's good for all of us.
The issue there is the complexity is necessary for some cases which are not met in trivial cases. E.g. When including element names from two sources. That is not a common use in json but with schemas you will come across it
Isn't that what XML did - you only put the namespace in if it was needed.
As for multiple ones in the same object it makes sense if you want to reuse a definition used elsewhere e.g. to add an address using a predefined address type. It is like using structures/records in programming languages but with no pointers for composition.
> Why is "how hard it is for beginners to understand a concept without reading a reference" a useful metric for measuring anything? So what if it's hard? -- Spend an hour with the reference document, and your problems will go away.
Or spend 0 hours reading the JSON reference to reach the same result.
I manage a team of reporting analysts who look at XSLT transforms all day. None of them have programming backgrounds and they have never found XML namespaces to be a problem.
Isn't this very easy ? Short, Succinct and a Simple No-frills string ? Put it in the attribute. Big, Long and Arbitrary Length Data ? Put it in the content of the element. Now gimme money.
These are actually what IntelliJ uses to validate all sorts of config files behind the scenes.
For work, we even do code generation off of the Meltano (ETL tool) spec and use it to validate reads and writes to the file (which we edit at application runtime) to catch errors as close to when they actually occur.
Most languages have some code generation tool requiring a compile step, but most of the specs in here change infrequently enough you can just do it once and commit to VC. I personally have a use case where I modify the Meltano (ETL tool) spec at runtime and use a generated scheme to validate reads and writes to the file, helping catch bugs early.
You could use this[0] package but you would need to download the schema first into a folder say "schemas" and then add a build step as a script in your package.json '"compile-schemas": "json2ts -i schemas -o types"' to export to a "type" folder
There are some YAML based schemas there too. How does this work, is there a canonical YAML->JSON transformation, or does JSON schema spec have explicit YAML support?
YAML is effectively a superset of JSON although the syntax used in YAML is often different. So you can't translate all YAML to JSON, but all JSON can be represented as YAML.
but XML allowed an easy way to have distributed schemas not needing a central place. The schema. URI could be made a resource that existed at the URI.
The wording gets complex as the URI does not need to exist on the web and that need for exact wording is I suspect a reason for XML to be perceived as complex
can JSON achema be used to describe say the schema of a RDBMS table?
is these some standardization here so I might use a JSON schema that already covers a lot of the fields that are needed to describe columns, constraints etc?
Its autocomplete story is a mess, but https://github.com/Kong/insomnia#readme at least allows one to visualize any schema authored in the document (it generates examples as well as a schema browser). It's possible that other OpenAPI tools behave similarly, I just happen to have the most hand's on with Insomnia
for example:
openapi: 3.0.0
info:
title: this is my title
description: a long description goes here
version: v1
servers:
- url: http://127.0.0.1:9090
description: the local server
paths:
/thingy:
get:
responses:
"200":
description: ok
content:
application/json:
schema:
$ref: '#/components/schemas/Thingy'
components:
schemas:
Thingy:
type: object
properties:
alpha:
type: boolean
as for the "create automatically," I'd guess that's a genuinely hard problem although if your example documents are simple/homogeneous enough you may get away with it
Even though XML is still king here (see TEI and other norms), some of its limitations are a problem. Consider the following text:
Now say you want to qualify a part of it: Now say you want to qualify another part, but it's overlapping with previous part: Of course, this is illegal XML... so we have to do dirty hacks like this: Which means rather inefficient queries afterwards :-/