Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Useful Old Technologies: ASN.1 (2013) (thanassis.space)
138 points by rdpintqogeogsaa on Dec 26, 2021 | hide | past | favorite | 93 comments


A warning, ASN.1 is disliked by many people in software security. It often features in security vulnerabilities involving TLS and LDAP/Active Directory. They would recommend against adopting ASN.1 for new projects.

I get the impression this partially due to a) most implementations being adhoc, of varying completeness, and written in C or C++ (e.g. OpenSSL); and b) sheer complexity of ASN.1 and other standards from that family/era.

Sources: following software security researchers/practioners on Twitter, a few podcasts, here, etc.


This root cause analysis matches up with my experience. You can absolutely have safe ASN.1 with ASN.1 compilers, but the implementations that star in these vulnerabilities tend to be generic or ad-hoc DER/BER parsers.

You'd be hard-pressed to be able to blame ASN.1 with XER (XML Encoding Rules) encoding since then you'd throw an XML parser at the problem, side-stepping the bit twiddling and exchanging it for XML parsing bugs instead. Similarly, if all you have is fixed-length items, OER (Octet Encoding Rules), you essentially get a formally specified struct with a well-defined way to go from there to XML or JSON with JER (JSON Encoding Rules).

I imagine we'll see this happen with CBOR (whose ASN.1 equivalent is technically CDDL) a few years down the road.


I think the root cause of this is that ASN.1 is deemed uncool and it is difficult to implement a standard compliant FOSS compiler for all the encodings. I was surprised doing research into binary XML and JSON how far you get with ASN.1 . However, the ecosystem seems so discoupled from anything else that there is rarely an implementation path for a quick win beyond mentioned hacky parsers that are both error prone and inefficient or purchasing some obscure commercial complier as I remember.

Edit: would be really happy to see some good open source compilers with extensible encoding support being listed here


> Edit: would be really happy to see some good open source compilers with extensible encoding support being listed here

I think that's the reason of ASN.1 being trapped in a very restrictive space. From Olivier Dubuisson's ASN.1 -- Communication between Heterogeneous Systems, p. 86 et seq. talking about use of ASN.1 by the IETF:

> the definition of many macros and macro instances to represent semantic links instead of information object classes and information objects although no ASN.1 compiler properly takes into account the macro concept (on the other hand, no compiler of the public domain does with the information object class concept unfortunately);

In essence, ecosystem failure due to a non-existent overlap of the telecom industry (full of commercial and expensive solutions) and the Internet community (working in a standards committee that tracks pre-existing permissionless innovation distributing the standards at no cost, strongly affiliated with free software ideas) caused ASN.1 to fall by the roadside.

Now that ASN.1 is considered uncool, the ecosystem certainly won't be coming anymore, so we'll be forced to reinvent ASN.1, probably re-learning every lesson along the way.


> In essence, ecosystem failure due to a non-existent overlap of the telecom industry (full of commercial and expensive solutions) and the Internet community (working in a standards committee that tracks pre-existing permissionless innovation distributing the standards at no cost, strongly affiliated with free software ideas) caused ASN.1 to fall by the roadside.

Yep. I think this is the crux of the matter.

I don't even think it is an "uncool" thing. In a word, IMHO it's too "closed". Xml is not exactly hip, but I can find high scoring python xml libraries on snyk.io. I know that is not a very scientific metric but it's a good proxy for a bunch of things that are hard to measure.

I think even if it were "cool", it would be a struggle to get a culture to nucleate. It has the same vibe as codecs: complex, kinda arcane, too closely associated with closed source, litigious, corporate culture. It doesn't help that it is guilty by association with numerous poorly written libraries. Unless that is what you mean by "uncool".

Now that there is cbor, msgpack, cap'n proto, protobuf, thrift, and a bunch others (just in the binary space), asn.1 would have to make a really compelling argument to win out. It would need a lot more than coolness.


> we'll be forced to reinvent ASN.1

I think this already happened with Google Protocol Buffers


> would be really happy to see some good open source compilers with extensible encoding support being listed here

This is a shameless plug, and not a compiler per se, but it's pretty close[0]. It's a codec framework for ASN.1 in Rust similar to `serde` for other formats if you're aware of that. It uses Rust's traits to define seperate layers for the codec model, and the encoding rules, allowing you to have one model used with many encodings, and share one decoder & encoder for all models. Extensibility just uses the regular Rust `#[non_exhaustive]` attribute. It doesn't support formats like A/UPER yet, but that is coming in the next year. :)

[0]: https://github.com/XAMPPRocky/rasn


Thanks, rasn looks really promising!! What are the projects this is used in? How easy is it to use as a codec layer from other programming languages?


> What are the projects this is used in?

I don't know of any third-party projects that are using it, though from talking to users and looking at the download numbers there are quite a few, they're just closed-source. You can look at the dependent downloads[0] to get an idea of whose using the implementations included with rasn.

> How easy is it to use as a codec layer from other programming languages?

It's as easy as using any Rust code from other programming languages. Rasn is completely IO-agnostic, it works entirely using byte slices (`&[u8]`). So all that should be required is writing a C-friendly layer if doing this by hand, or using a language specific binding library like PyO3 to generate more friendly bindings.

[0]: https://crates.io/crates/rasn/reverse_dependencies


> a) most implementations being adhoc, of varying completeness,

While attempting to implement clean auditable code which would accept all signatures openssl accepted and reject all that it rejected years ago I fell down a rabbit hole of trying to discover if there was any complete and correct open source implementations of BER. I audited dozens of them and was not able to find a single one. They all implement different subsets and have various flaws in the more weird/useless ways of encoding things.

This effort eventually result in an openssl CVE, and we never got a consistent implementation: the exact set of messages accepted was far too irregular and dependent on the implementation.

(OpenSSL eventually 'fixed' the issue by restricting the set of accepted inputs ... in an uncoordinated manner guaranteed to create a vulnerability for any system where consistent validation is security critical.)

ASN.1 isn't unique in this though. Almost all other complex parsed formats with multiple implementations have troublesome inconsistencies too. The gap between "make it work" and "make it always work correctly" is too big. People make code that is sufficient for the purposes they care about and then it gets deployed into places where it's not sufficient but good enough to look like it is for a little while.


It often features in security vulnerabilities

As the saying goes, "don't shoot the messenger". There's more quantity than quality when it comes to ASN.1 implementations, especially free ones, and that's largely where the impression is coming from. DER in particular is not hard to parse, but people don't seem to bother with thinking carefully about the edge cases and such.


Complete ASN.1 by prof John Larmouth[0] on page 366 contains following epiphany:

If you were to wave a magic wand and eliminate from the world all messages that are encodings of ASN.1-defined values, disaster would certainly strike on a scale far beyond any that the most pessimistic have described for possible effects of the Y2K computer bugs. Aircraft would collide, mobile phones would cease to work, virtually all telecoms and network switches would be unmanageable and unmaintainable and would gradually die, electric power distribution systems would cease to work, and ... smart-card-based electronic transactions would fail to complete and your washing machine might fail to work ... and your life would become a misery!

--

It's suffice to say that them security people have proven, and unfortunately are likely to keep proving him to have been right on more than one occasion.

[0] https://www.oss.com/asn1/resources/books-whitepapers-pubs/la...


You could make the same sort of argument about C and yet we know for certain that it is a particularly insecure and error prone language.


I am using ASN.1 because Erlang has a pretty awesome implementation of it. The documentation[1][2] is great too.

[1] https://www.erlang.org/doc/apps/asn1/asn1.pdf

[2] https://www.erlang.org/doc/apps/asn1/asn1_getting_started.ht...


What do the security experts recommend?


Not using hand-written shitty parsers, having a spec and validating it too.

Or day drinking and quitting your job, depending on the mood.

ASN.1 done right is pretty easy to secure, and much less error prone than pretty much any text format


Fuzz for low hanging fruits.

Analyze code: peer review, bounty, ...

Harden through invariant checks and logic checks.

Make your system/code anti fragile by detecting wrong encoding, reporting it (uh-huh, yeah, think that any code even reporting/logging is an attack surface, hint hint log4j) and treating future traffic from source of attacks as a way to train your defense.

Create a culture of security: simple architecture, design workshops, threat modeling, thinking about what someone with attacker mindset would do, value experience (coming defense source) AND involve newbies (they will try again what didn't work in the past, and sometime it works)


XDR perhaps? Less ambitious than ASN1. Used by many systems.

https://en.wikipedia.org/wiki/External_Data_Representation


Generally speaking, from a security standpoint, simpler is better. It of course depends on your requirements, but if you just need a structured data exchange format, JSON is an obvious choice. But again, it depends on what your requirements are.


Ugh, no. JSON is garbage, especially from a security standpoint. For security purposes you want an unambiguous format with a well-defined canonical representation that's easy to parse. Not only is JSON none of these things, many people labor under the mistaken assumption that it's saner than it really is and even quite experienced people get bitten in the ass by that: https://github.com/endojs/endo/issues/115

Quoting http://seriot.ch/projects/parsing_json.html:

In conclusion, JSON is not a data format you can rely on blindly. I've demonstrated this by showing that the standard definition is spread out over at least seven different documents (section 1), that the latest and most complete document, RFC-8259, is imprecise and contradictory (section 2), and by crafting test files that out of over 30 parsers, no two parsers parsed the same set of documents the same way (section 4).

Here's an example of an actually simple format where it's easy to define a canonical representation: https://en.wikipedia.org/wiki/Canonical_S-expressions; it was used for SPKI (https://theworld.com/~cme/html/spki.html) which sadly never went anywhere but is much more pleasant than the ASN.1/X.509/JWT etc mess that actually won.


> Not only is JSON none of these things, many people labor under the mistaken assumption that it's saner than it really is and even quite experienced people get bitten in the ass by that: https://github.com/endojs/endo/issues/115

That link is not an issue with json, but with javascript (that js treats keys named __prototype__ special). Its one of the most famous js security bugs, and also applies to xml and whatever else if you dont implement the parser carefully.


> That link is not an issue with json, but with javascript (that js treats keys named __prototype__ special).

JSON is meant to be (and was widely advertised as) a Javascript subset, but isn't, but almost no one is aware of this. So of course it's also an issue with JSON. It's also not the only divergence that leads to security problems, in fact if you follow the first link in the comment you will see that this was posted in the context of trying to fix another misspecification in an attempt to actually make JSON a proper subset (which turned out to be impossible to completely accomplish, although at least the paragraph separator issue has been fixed in 2019). See here:

https://v8.dev/features/subsume-json


JSON is very widely used, so if it’s properties were bad for security, we should expect different JSON-based attacks popping up every week, right? How often do we see these kinds of attacks?


It's not hard to find quite recent json parsing vulnerabilities, e.g.

https://www.trustwave.com/en-us/resources/blogs/spiderlabs-b...

A fun historical one was remote code execution due to a json parser bug in Google Chrome. Oops. If you try a simple search on mitre, you'll see some false positives, but also a bunch of actual vulnerabilities each year:

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=json

Here is a nice blog post walking through various JSON based exploits:

https://bishopfox.com/blog/json-interoperability-vulnerabili...


Interesting. I haven’t talk heard of any of those.


JSON is not a simple format. I don't feel confident enough to recommend anything in particular, but I believe one of the well-designed binary formats would be a much better choice.

I'll just leave this here.

https://news.ycombinator.com/item?id=28826600

https://seriot.ch/projects/parsing_json.html


IMO Json is more complex than the popular DER/BER encoding of ASN.1 and suffers from an ambiguous specification.


I'd rather use CBOR instead of JSON when doing anything outside the browser that doesn't need to be human editable


IMHO: CBOR is a worse encoding than msgpack, despite being a copy-paste clone of msgpack: Indefinite lengths, weird fp formats (decimal floats?), iirc CBOR also "improved" msgpack's int encoding by making it annoying to de- and encode, the whole tagging system is superfluous etc.

Just because something has an RFC number doesn't make it good.


Robust methods of parsing untrusted inputs.


How does ASN.1 compare to something like binary protobuf?

I have no data to back this up, but I was under the impression ASN.1 is one of those specifications that is so large and complicated it’s essentially impossible to implement correctly; and that ASN.1 parsers are a pretty infamous source of security bugs (although I mean, most parsers seem to be).


> How does ASN.1 compare to something like binary protobuf?

Protobuf is probably easier to work with in many cases, but there are a few issues that I've run across that makes protobuf a greater challenge in production:

1. Backwards compatibility.

proto2 vs. proto3 for example introduced changes that you can work around but could cause issues if you're upgrading. Among those changes was the handling of optional vs. required elements; this may have changed since, but IIRC proto3 makes all elements optional.

ASN.1 also supports explicit extensibility and message versioning.

2. ASN.1 supports canonical encodings (CER and DER for sure, and I think there are canonical encodings for PER and XML), which means that the same data are encoded exactly the same way across the wire. Protobuf doesn't guarantee element ordering. When hashing messages, this is important.

3. ASN.1 supports length and value constraints, and compilers can take advantage of these in message validation and (in some cases) message compression. Protobuf, last I checked, did not.

ASN.1 predates open source software, though, and that means that most (all?) of the viable solutions are commercial (Objective Systems, OSS Nokalva, Marben, etc), or else rolled in-house (e.g., erlang). My guess is that the older standards were designed in part by LISPers: I'm not sure how else you would produce a general solution for MACRO constructions in their absence.


It's also much easier to construct a safe parser if you can use metaprogramming and effectively build a parser from generic blocks.

Consider also how ASN.1 heavily uses specifications that are supposed to be read by compiler generators, something that also is replicated in Protobuf, but with less expressive power.


ASN1 has a much richer type system than protobufs, and is more general in that it accept several encodings, some of which optimised for size, some for speed, some for simplicity.

I believe we could prune all the unnecessary 90s fad from the specs and end up with something actually pretty nice. I wish Google had done this as ASN2 instead of protobuf.


> ASN1 has a much richer type system than protobufs, and is more general in that it accept several encodings, some of which optimised for size, some for speed, some for simplicity.

You say that like its a good thing.


Protobuf doesn’t have a date/date time/time type which means you get dates encoded as strings and all the fun that entails.

Not saying that either is better but just sometimes you actually do want more types instead of less.


Well there is Timestamp defined as a well known type which is available to all implementations despite not being a primitive type [1]. Plus one is obviously able to define any other custom types if necessary- eg as seen in [2].

[1] https://developers.google.com/protocol-buffers/docs/referenc...

[2] https://github.com/googleapis/googleapis/blob/master/google/...


Well known types strike an interesting balance between IDL and compiler simplicity vs ergonomics.

For example maps were added in proto3. They use the same binary encoding as the proto2 pattern that you could use to implement mappings (repeated field of key value pair messages). Adding maps as first class citizen in the IDL dealt with the lack of type parameters that would have been needed to implement a Map well-known type.


Ok my bad, I was looking to quickly through the base types in the proto reference


Time doesn't have to be hard though? Just stick to an international standard and stick to it. Reject anything that doesn't comply with your chosen format. It's better to fail than to continue running after you receive garbage data. I admit we have a close system at work be we never have time issues.


How could a richer type system or more choices regarding the encoding be a bad thing? I don't understand your remark.


A big design goal of Protobuf was long-term evolvability of message definitions with backward and forward compatibility. How rich is ASN.1 in that aspect?


Protobuf has known issues with backward/forward compatibility that require special attention, ASN.1 makes it IMHO remarkably easier (also consider kerberos which uses ASN.1, defining each extensible piece of data with extra tag declaring whether it must be parsed or can be ignored and passed on)


Can you point us to which well known backward compatibility issues protobuf has?


This is indeed important for a serialization format to permit extensions, and ASN1 types can accordingly be extended in a backward compatible way (for instance, enumerations can be said to also accept other constructors to be defined in the future).


If you're interested of learning more about ASN.1, two great books on the subject are:

- ASN.1 Communication between Heterogeneous Systems[0]

- ASN.1 Complete[1]

Both of which are available online for free. Last time I checked, they may not cover the latest fads like JER etc. but will provide you with a solid understanding of ASN.1 nonetheless.

[0] https://www.oss.com/asn1/resources/books-whitepapers-pubs/la... [1] https://www.oss.com/asn1/resources/books-whitepapers-pubs/du...


Some ISO ASN.1 standards (unfortunately not ISO 8824 and 8825) can be downloaded for free from https://standards.iso.org/ittf/PubliclyAvailableStandards/.


All standards about ASN.1 (not including standards that use ASN.1 like MMS) are available for free from the ITU under the X.680 to X.697 series of standards.[0]

[0]: https://www.itu.int/rec/T-REC-X/en


Indeed, thanks for the hint.


The bittorrent protocol uses a very simple binary data representation system: bencoding.

It can be described in a single paragraph: https://www.bittorrent.org/beps/bep_0003.html#bencoding

Although it supports only dictionaries, lists, ints and strings, it may be enough for your use case and is easily extendable.

Edit: also ASN.1 does not solve futute-extending out-of-the-box. In JSON/bencoding/XML, it's trivial to add another key/element to a dictionary/array/element and allow applications to use it if they can and ignore it otherwise. Okay, TLV looks like it could handle that, but there's nothing about it in the description language itself.


> also ASN.1 does not solve futute-extending out-of-the-box. In JSON/bencoding/XML, it's trivial to add another key/element to a dictionary/array/element and allow applications to use it if they can and ignore it otherwise. Okay, TLV looks like it could handle that, but there's nothing about it in the description language itself.

This is wrong. ASN.1 has always had support for "extensibility" out of the box. See section 52 of X.680[0] for complete details but I've included a couple of examples below. Also ASN.1 unlike other IDLs allows encodings to optionally take advantage of whether a type is considered in extensible. For example in PER, a type is a smaller encoding when it is not extensible.

    A ::= INTEGER (0..10, ..., 12) -- A is 0 to 10, or 12.

    -- V1 of B contains `a, b`, V2 contains `a, b, c`
    B ::= SEQUENCE {
        a INTEGER,
        b BOOLEAN,
        ...
        c OCTET STRING,
    }
[0]: https://www.itu.int/rec/dologin_pub.asp?lang=e&id=T-REC-X.68...


Thanks for the information! I forgot to mention the only "research" of ASN.1 I ever did is the linked post (:


ASN.1 was great. Another old specification language still in use today is EXPRESS, one of the foundations of the huge ISO 10303 (STEP) standard series, as defined in ISO 10303-11. It even has algebraic data types. The development of STEP started in 1984 and has been one of the largest efforts ever undertaken by ISO.


They should have mentioned protobuf which it is closer to.

To my knowledge, there is no bad nullability in ASN.1, unlike protobuf. Good job!


Among those binary formats, this one benefits from the ubiquity of OpenSSL (see asn1parse) for debugging.


ASN.1 is all over the place in low-level networking protocols. That is where I learned it, two decades ago. While I disliked it initially, it was thoughtfully designed albeit obviously from a prior era.

Also, turns out that if you work with it long enough you can read the binary encodings almost like it is text.


> can read the binary encodings almost like it is text.

I'm curious what encoding rules that applies to: BER/DER I could easily believe, aligned PER a bit less so, and unaligned PER with lots of extensions and optional parts would be impressive.


It's the underpinnings of snmp too. ASN is one of those things that's everywhere you've never looked


All of OSI stack used ASN.1 extensively, something that would have made building protocol implementations much easier than text-based IETF style.

SNMP is, IIRC, inspired by much more capable CMI which was part of OSI stack of protocols and is still used in some high end WAN network gear for monitoring and control (whereas "write" aspects of SNMP have mostly disappeared)


The cellular network standards (GSM/xG) use it heavily too.


I worked on a SNMP Go package many years ago [0] and it was really fun learning about ASN.1 and figuring out how to write encoders and decoders for various types.

[0] https://github.com/Cistern/snmp


Does ASN handle versioning of the messages?

If I add a field to the message I expect the new binaries to read the messages serialized by the old binaries and vice versa.

Obviously the old binaries are not expected to understand the added field but they should not break when handed the upgraded message.


You might be interested in how Kerberos uses ASN.1, with extension messages having a field that describes whether understanding the extension is required (this is important if extension covers information that might change the result of authorization operation, for example)


Yes


If you ran into ASN.1 run away (if you can). It's so insane hopeless overenginered with so many intersecting standard, different representations and nobes that it can easily become a nightmare. Most parsers also only support an opinionated subset for a specific purpose simply rejecting everything outside of the supported sunset even if they by specification should support it. This night sound fine, until you actually run into it in production and two supposedly compatible systems more are incompatible with no easy fix around.


Its greatest weakness is 10 different string encodings. I've always wondered what sort of bugs you could expose with a cert using videotex for the CN fields.


Well, you can constrain what kind of string encodings are acceptable in your structs. The reason it has so many is because so many were actually in use, sometimes with important semantic differences that.

You can't just hope that stream of 8bit bytes will not have anything outside of accepted in target system string understanding.


Like Rust?


As a Rust dev it's news to me that the language has 10 different strings. Care to elaborate?


Not the OP, but off the top of my head, we have:

* str and String, which are the ones we just call “strings”

* CStr and CString, for interop with libraries that work with null-terminated strings

* OsStr and OsString, for interop with system libraries that expect non-UTF-8 encodings

* Path and PathBuf, which are also just strings

* [u8] and Vec<u8>, which are encoding-neutral binary strings (&[u8] is the type of b"Hello world") and are useful for interop with e.g. C++’s std::string

That’s 10.

(For non-Rust people who are curious: In these pairs, the first one is the type of a view of a string that’s stored elsewhere, and the second is the type that actually owns the string data. All of these 10 have their uses, but outside of interfacing with non-Rust libraries, you mostly work with str/String and Path/PathBuf.)


OK, technically correct but I still disagree they are actually 10 because aside from `str` (which should be used only for constants really) and `String` everything else is strictly optional and you're not mandated to work with it unless you got a specific scenario (like CStr / CString).

Compare this to Haskell where you have lazy and non-lazy / Unicode and non-Unicode strings which is at least 4 (I think there was more but I forgot now) and you are practically forced into using any of them depending on which library you're interacting with -- and I don't mean with C libraries in this instance.

I personally never even used anything beyond `str` and `String` for any app development, including a bit lower level stuff (though not embedded or tight integration with system libraries).


Haskell also has `String`, which is a linked lazy list of boxed UTF32 characters. Yes, it's as horrible for performance as it sounds. Also, there are `ByteString`s which wrap strings in other encodings.


I don’t think we actually disagree, see my last paragraph.


True, I overreacted a little, sorry. Thanks for being a good sport.


this is cool. i’ve only ever used asn1 as content for metadata in x509 certs and it makes a tiny bit more sense now why this is better than a basic string encoding


I've been using asn.1 for 20 years now, and i find it absolutely fabulous. I still learn something new from time to time, when something doesn't parse or encode correctly :)


It's very clear the writer has not used ASN.1 or BER/DER encoding. There's no discussion of which standard for bit fields to use, constructed encoding, the practical requirement of an ASN.1 compiler or any security concerns.

Heck, why not just use CORBA?


Oh hey! There's quite a chunk of code in the Linux kernel to support this protocol. I hope that people here find some use for this "useful" old technology, otherwise it's just taking up space.


ASN.1: I can only imagine that one day someone said "Strings are bad for security" and some bold lad overheard and said "Hold my Beer!".


Nice to see someone actually put the TL;DR at the top for a change..

Never understood the logic of people putting them at the end of an article.. If I wasn't gonna read the whole thing, what makes you think I'm gonna scroll to the bottom?


the phenomena of "TL;DR" is satirical idiot culture made real


It's a slightly humorous term for an executive summary.


Tldr is a standard part of any scientific paper, with a nicer name (abstract, summary).


I use ASN.1 between Kotlin <> Erlang, where Kotlin is the client and Erlang is the server, of course.


How is it better than Protocol Buffers?


please, don't!


  SEQUENCE OF ANY --there I maximally compressed all the comments


Useful? Hell, no. Like most technology derived from OSI, it's garbage that has been the source of a huge number of software vulnerabilities.


~98% of the ASN.1 parsing bugs could have been prevented by generating the parser instead of handwriting yet another recursive descent parser "with a few clever optimisations".


But think about that 1us lost. /s


Reminds me of all the Trotskyists who say Communism has never been discredited because no true Communist regime ever existed, only forms of State Capitalism. Why oh why has no actual ASN.1 parser ever hewed to the self-evident Platonic ideal of machine-generated purity?


ASN.1 parser generators exist though...


The question around this is whether vulnerabilities show up because the technology is bad or because it’s widely used, triggers interest by researchers, and a certain amount of any implementations will have their set of issues.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: