Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not even sure why ASCII has a grave accent. There are no combining marks so you could never write it over another letter.

Edit: I forgot HTAB was actually part of ASCII. Oh well!



On a teletype, ALL characters are combining marks because you can backspace (another ASCII character derived directly from teletype codes) and type another character overtop it.


Are you old enough to remember when you printed your code on a teletype machine that blank spaces were represented by a "b" with a slash through it? I hated that.

Even worse, I remember one shop where the teletypes didn't have question marks, so people used capital P's instead.


In ASCII instead of combining marks what you have to do is write three characters:

* The unaccented letter

* The backspace character

* The accent character

If this makes no sense to you, try to imagine a literal, physical typewriter. Windows line terminators also work with a similar principle.


From what I recall from my childhood, physical typewriters worked slightly differently: the accent keys were non-advancing ("dead") keys. You pressed the "acute" key followed by the "e" key for an é, for instance. If you wanted a bare accent, you pressed the accent key followed by the space bar.

(The typewriters I recall also didn't have a 0 or 1 key, you used uppercase O or I for these numbers.)


Yes. I consider it a mistake of Unicode that combining characters follow rather than precede the base character. If they preceded, most dead keys could simply generate the appropriate combining character, rather than requiring complicated input method support. (And finding the end of a sequence of multiple combining character wouldn't require lookahead.)


The Unicode way makes sorting easier. Your way would require special knowledge about the characters to know that ä should sort directly after a, rather than directly before ë.


In fact this requires special language-specific knowledge anyway (which unicode provides in some tables and algorithms actually). In some languages ä should sort exactly as if it were 'a'. "aa", "äb", "ac". In others it should sort as a distinct letter (but not necessarily between 'a' and 'b'). Different Latin languages sort differently, I'm not sure if exact UTF-8 (or UTF 16 or UTF 32) byte ordering is actually appropriate collation in any latin-alphabet language.

But I do suspect it had something to do with ascii compatibility, I don't recall what. Very little of unicode is accidental, there's usually some reason for whatever in it.


some languages even sort

aa ah az ba bh bz ca cz ch

treating "ch" as a single letter that comes between c and d.

or

ab ah az b c .... z aa

treating aa the same as a separate letter at the end of the alphabet.

Then there's other rules for sorting that aren't directly alphabetic, like that names beginning with "Mc" should be treated as "Mac" or "St " as "Saint ".

"10 cats" should sort after "2 cats", not before it.

Anyone who tries to sort by just numeric ordering is doing it wrong.


Except not really, since in Swedish, it's sorted xyzåäö, not aåäbc. Also, it used to be that w and v were equivalent sorting-wise and you'd mix them together.


I believe that depended on the manufacturer and country convention. Most US keyboards didn't have an accent character. For example, here's one from the 1950s:

http://www.typewriters101.com/uploads/1/7/6/6/17660651/s7662...

For acute or umlaut you could use a + backspace + ' or u + backspace + " (or the opposite order). For grave or circumflex, I don't think there was a solution. Write it in by hand?


When I was in school back in the 1990s, that was certainly the approach taken for the Vietnamese edition of the school newsletter.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: