Does JSONC have a specification or formal definition? People have suggested[1] u...

mananaysiempre · on Nov 5, 2023

Unfortunately, JSON5 says keys can be ES5 IdentifierName[1]s, which means you must carry around Unicode tables. This makes it a non-option for small devices, for example. (I mean, not really, you technically could fit the necessary data and code in low single-digit kilobytes, but it feels stupid that you have to. Or you could just not do that but then it’s no longer JSON5 and what was the point of having a spec again?)

[1] https://es5.github.io/x7.html#x7.6

debugnik · on Nov 5, 2023

Or take the SQLite route, if you don't need to reject strictly invalid JSON5. When they extended SQLite's JSON parser to support JSON5, they specifically relaxed the definition of unquoted keys (in a compatible way) to avoid Unicode tables[1]:

> Strict JSON5 requires that unquoted object keys must be ECMAScript 5.1 IdentifierNames. But large unicode tables and lots of code is required in order to determine whether or not a key is an ECMAScript 5.1 IdentifierName. For this reason, SQLite allows object keys to include any unicode characters greater than U+007f that are not whitespace characters. This relaxed definition of "identifier" greatly simplifies the implementation and allows the JSON parser to be smaller and run faster.

[1]: https://www.sqlite.org/json1.html

lifthrasiir · on Nov 6, 2023

Or take the XML route. Unicode offers several strategies to avoid continuously updating the Unicode table [1] which has been adapted by XML 1.1 and also later editions of XML 1.0. The actual syntax can be as simple as the following (based from XML, converted to ABNF as in RFC 8295):

    name = name-start-char *name-char
    name-start-char =
        %x41-5A / %x5F / %x61-7A / %xC0-D6 / %xD8-F6 / %xF8-2FF / %x370-37D /
        %x37F-1FFF / %x200C-200D / %x2070-218F / %x2C00-2FEF / %x3001-D7FF /
        %xF900-FDCF / %xFDF0-FFFD / %x10000-EFFFF
    name-char = name-start-char / %x30-39 / %xB7 / %x300-36F / %x203F-2040

In my opinion, this approach is so effective that I believe every programming language willing to support Unicode identifiers but nothing more complex (e.g. case folding or normalization or confusable detections) should use this XML-based syntax. You don't even need to narrow it down because Unicode explicitly avoided identifier characters outsides of those ranges due to the very existence of XML identifiers!

[1] https://unicode.org/reports/tr31/#Immutable_Identifier_Synta...

tubthumper8 · on Nov 6, 2023

Yeah, definitely for small devices. For things like VS Code's configuration file format (the parent comment) or other such use cases, I don't see a problem