> You can totally have tooling that exposes a GeneralString's blob payload to the application and lets the app handle the codeset switching aspects of GeneralString.
You can do that for all types, and my own implementation does just expose the payloads for all types, although it also has functions to encode and decode many of them, they are functions that must be called separately (e.g. asn1_decode_number and asn1_decode_date).
For some applications, there is no need to decode the payload anyways, and you can just treat it as opaque data if you do not need to display them (the same is true for many other types).
> I want to object that anyone who really has to handle multi-codeset GeneralString values would want a better library
I did start to try to write such a thing (it is not published yet), and the library for ISO 2022 is separate than the library for ASN.1, although they can be used together. It is intended to be usable for multiple uses. (I might also add support for character codes other than ISO 2022 (such as the encodings of Unicode and TRON code), although it is mainly intended to support ISO 2022.)
> ...what would that be if not a converter to/from Unicode?
For one thing, not all of the codes can be correctly converted to/from Unicode (especially control characters), and even if they can be, this is does not preserve some of the details, because the way the character set works is different from Unicode (e.g. some things might be considered to match in Unicode but not other character sets and vice-versa; this would also be true of case-folding, missing details, character properties, etc). For example, some information may be lost when converting to Unicode.
(This does not necessarily mean that converting to/from Unicode is never useful, although you should consider if you can do it in a better way; for example, if your program accepts input that is meant to be added to a General string (or Graphic string) in a DER file then it would be better to accept ISO 2022 input directly if possible. Converting to TRON code (especially for CJK text) might be better, but even that depends on what you are using it for; some uses do not require conversion at all.)
If you really want to store Unicode text anyways, you should consider if UTF-8 is a valid type according to the schema you are using (and use that type instead if so; note that some schemas might not care so much about the type in some cases); if not, consider prepending the three bytes <1B 25 47> when encoding it as a ISO 2022 string. (As far as I can tell, this is not really supposed to be allowed in ASN.1, but I suppose it is possible if you really need to. You might also check if it is only ASCII and avoid adding this prefix if so.)
> But what if one really wants an array/list of {codeset, string} pairs? At that point open-coding support for those escapes is probably just as well since that one might have the only application in the world that wants that!!
I think that what you will want will depend on the specific application. Some will want that, some will want something else, and also different ways that you might want handling control characters, etc.
You can do that for all types, and my own implementation does just expose the payloads for all types, although it also has functions to encode and decode many of them, they are functions that must be called separately (e.g. asn1_decode_number and asn1_decode_date).
For some applications, there is no need to decode the payload anyways, and you can just treat it as opaque data if you do not need to display them (the same is true for many other types).
> I want to object that anyone who really has to handle multi-codeset GeneralString values would want a better library
I did start to try to write such a thing (it is not published yet), and the library for ISO 2022 is separate than the library for ASN.1, although they can be used together. It is intended to be usable for multiple uses. (I might also add support for character codes other than ISO 2022 (such as the encodings of Unicode and TRON code), although it is mainly intended to support ISO 2022.)
> ...what would that be if not a converter to/from Unicode?
For one thing, not all of the codes can be correctly converted to/from Unicode (especially control characters), and even if they can be, this is does not preserve some of the details, because the way the character set works is different from Unicode (e.g. some things might be considered to match in Unicode but not other character sets and vice-versa; this would also be true of case-folding, missing details, character properties, etc). For example, some information may be lost when converting to Unicode.
(This does not necessarily mean that converting to/from Unicode is never useful, although you should consider if you can do it in a better way; for example, if your program accepts input that is meant to be added to a General string (or Graphic string) in a DER file then it would be better to accept ISO 2022 input directly if possible. Converting to TRON code (especially for CJK text) might be better, but even that depends on what you are using it for; some uses do not require conversion at all.)
If you really want to store Unicode text anyways, you should consider if UTF-8 is a valid type according to the schema you are using (and use that type instead if so; note that some schemas might not care so much about the type in some cases); if not, consider prepending the three bytes <1B 25 47> when encoding it as a ISO 2022 string. (As far as I can tell, this is not really supposed to be allowed in ASN.1, but I suppose it is possible if you really need to. You might also check if it is only ASCII and avoid adding this prefix if so.)
> But what if one really wants an array/list of {codeset, string} pairs? At that point open-coding support for those escapes is probably just as well since that one might have the only application in the world that wants that!!
I think that what you will want will depend on the specific application. Some will want that, some will want something else, and also different ways that you might want handling control characters, etc.