Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A short introduction to the undeciphered Aegean writing systems (itsallgreektoanna.wordpress.com)
126 points by bigbillheck on Jan 31, 2022 | hide | past | favorite | 34 comments


John Chadwick's "The Decipherment of Linear B" is a fascinating book. Michael Ventris had a sad death, from a car accident at 35.

I see it's in kindle. I must get it, can't find my paperback.

https://www.amazon.com/Decipherment-Linear-Canto-Classics-eb...


You can also borrow/read it for free from internet archive:

https://archive.org/details/deciphermentofli00chad


The same book is also available without an account (although the images are lower quality).

https://archive.org/details/ChadwickJohnTheDeciphermentOfLin...


There is a sort of romantic fascination to dedicating yourself to the study of what is probably an impossible puzzle.

We'd learn so much if only we had a key. So much history inaccessible to us.


> We'd learn so much if only we had a key. So much history inaccessible to us.

Well, as this post notes, one of the things that makes decipherment difficult is a lack of material.

The other side of that coin is that we aren't missing a lot.


Fair point.


Why are (almost) all these writing systems from Crete? Were people not writing things down on the mainland at that time, or have their writing systems all been deciphered? Or is there some other explanation?


Not a historian, but the obvious explanation is that Crete was the centre of the Minoan civilisation, ‘the first advanced civilisation in Europe’ [0]. The only other scripts in use at the time were various forms of cuneiform in Mesopotamia and Anatolia (for Sumerian, Akkadian, Hittite etc.) and hieroglyphics for Egyptian, all of which have been deciphered. We have far more evidence for those scripts than we do for the various Aegean ones.

[0] https://en.wikipedia.org/wiki/Minoan_civilization


Makes sense. Another dimension to this is that Minoan language does not appear to have had any resemblance to any form of Greek. The most ancient cultural artifacts from Mycenae seem to be of Minoan inspiration while Crete itself seems to have had intrusive Mycenean characteristics until it was completely supplanted by the Mycenean.


Asking as a total layperson: have any automated machine translation techniques been applied to undeciphered scripts such as the ones discussed in this article? If so, what was the outcome?


Just what would the models be trained on? Machine learning requires you to have a corpus of mappings between peices of texts in two languages, each of which have been established to be close in meaning -> thereby requiring there to be no decipherment problem withstanding beforehand.

There are a series of problems in decipherment of Minoan scripts from what I understand: 1. The Linear A & Cretan script are undeciphered in the sense that we do not know what phonetics the symbols stand for 2. The Minoan language hypothesized to be written in these scripts is undeciphered in terms of vocabulary (given we do not know the phonetics of the letter symbols) & by extension its semantics. At best a kind of crude syntax can proposed for it based on patterns of the word-symbols in texts of the hypothesized language written in those scripts.


> Just what would the models be trained on? Machine learning requires you to have a corpus of mappings between peices of texts in two languages, each of which have been established to be close in meaning -> thereby requiring there to be no decipherment problem withstanding beforehand.

I was thinking of unsupervised machine translation specifically[1].

[1]: https://paperswithcode.com/task/unsupervised-machine-transla...


Unsupervised machine translation works by distribution-matching embeddings on the corpora you want to translate between. If the corpora are large enough, their distributions can be estimated robustly, and if they have sufficient overlap in the topics they cover, it's likely that words with similar distribution have similar meanings.

So if there were a large amount of undeciphered Linear A inscriptions on a guessable range of topics, unsupervised machine translation might be worth a try.

Unfortunately, there aren't that many Linear A inscriptions, and for those where the kind of content was known, the distribution matching has already been carried out by hand. E.g. from the article: "the word AB81-02, or KU-RO if transliterated using Linear B sound-values, is one of the few words whose meaning we do know: it appears at the end of lists next to the sum of all the listed numerals, and so clearly means ‘total’. But we still don’t actually know how to pronounce this word, or what part of speech it is, and we can’t identify it with any similar words in any known languages."


All human languages have some commonalities. I remember this headline from 2016, when Google had an AI model that trained on some languages was able to somewhat successfully translate between language pairs it had never seen before:

https://ai.googleblog.com/2016/11/zero-shot-translation-with...

Not sure what the state of research like this is now, or whether it's been applied to stuff like this. I hope so!

EDIT: although it looks like it had seen the languages, but the pair it hadn't translated between.


All the "unseen" pairs of languages had parallel texts with English so Google's vaunted "interlingua" was most likely natural English.

But you shouldn't be downvoted for repeating Google's claim I think. It should be Google that is shamed for peddling such unmitigated nonsense.


I'm not a linguist and haven't done much machine learning, so forgive my naïveté, but aren't there a couple of core features/structures all languages have? That might not be enough to learn anything/might be what you're saying, but I wonder if there's a way to determine what a plausible grammar is and to try to identify what the nouns are. One you have the nouns, you could try just doing a huge substitution game on different texts in that language and try to see what guesses make the most sense for the most sentences.


> aren't there a couple of core features/structures all languages have?

The answer is 'no', especially in the sense that there's a relatively finite set of possible grammars that a putative natural language could be compared against. In terms of basic parts of speech, I believe that every language does have something that you can describe as a noun, but that's more or less the only "universal" part of speech (there are some languages that essentially don't have verbs--you "do a look" rather than "see", e.g.).

The more serious problem, I think, is that the corpus of Linear A is simply too tiny to do any serious study, and I don't know how well the written corpus is at actually reflecting problems like segmenting text into words or morphemes. In essence, the available evidence is so paltry that you could justify just about any grammatical hypothesis, I suspect.

If I'm understanding Chomskyian linguistics correctly (that's a really big if), there was originally thought to be an inherent "language organ" that strongly controlled grammar. But over time, and as linguistics documented more languages, the things that are universal in grammars in this subdiscipline has essentially been reduced to 'merge', which is an abstract concept that I'm pretty sure I don't understand.


Recursive nesting still stands as universal. Pretty much every language has a form of "he said {she said { stuff }}". One guy claims to have found a language deep in the Amazon without this feature, but no one thinks he's credible.

Pretty much every language does verbs connecting subject/object. In English it's S-V-O. Treating addition as a verb, our sentences work kind of like X+Y. Other languages use something like reverse polish.

The original Chomsky idea was that there was an underlying brain structure reflected in language and only a small number of tunable parameters defined the whole space. This hypothesis was meant to explain how humans learn language so quickly. The idea was, much like horses that start galloping just after leaving the room, maybe we're born already sort of knowing it.

The theory hasn't panned out so far. There's too much similarity between unrelated languages to pretend like some universal mechanism isn't behind them, but for nearly any particular lingustic feature you can usually find at least one obscure example which shows it isn't universal. The two features I've listed are the exception.


There are 6 orders of S, V, and O. All are attested, but the orders where O precedes S are vastly less common than those where S precedes O.

The counter-argument to Chomsky's Universal Grammar is that commonalities across language exist because they are all solving similar problems for which there are more or less optimal solutions. In essence, convergent evolution produces similarity. And one might argue that recursive structure isn't a feature but rather the absence of a feature: something to prevent recursion. If you have a rule that says you can make category X out of pieces which may also be of category X, then you have recursion. So using a noun as a modifier of another noun -- "lunch counter" -- produces recursion.


> I believe that every language does have something that you can describe as a noun, but that's more or less the only "universal" part of speech (there are some languages that essentially don't have verbs--you "do a look" rather than "see", e.g.).

R.M.W. Dixon argues convincingly that every language has at least nouns and verbs; less convincingly (IMO), he also argues for adjectives as a universal category. Some languages have only a very small set of verbs — Kalam has ~200, and Wutung has only 32 [0] — but they still have an identifiable category of verbs nonetheless. (Jingulu has been said to have 3 verbs, but only if you disqualify the large set of non-inflecting verbs.)

[0] https://openresearch-repository.anu.edu.au/handle/1885/10937... — a complete list of all 32 simple verbs may be found on page 293 (322 of the PDF)


> The more serious problem, I think, is that the corpus of Linear A is simply too tiny to do any serious study

This is probably a silly question: but if there the corpus is so small, why are we convinced that it has any meaning at all?


"There was originally thought" meaning "Chomsky thought". But it was all obvious bollocks, contrary to elementary natural selection.

Grammars necessarily have to be compatible with brain organization inherited from our primate ancestors, who obviously could have had no "language organ" carried about waiting to find some sort of use by their future descendants. Brain structures all need to be immediately useful for surviving or reproducing.

One theory has been that language runs on a bit of brain hypertrophied as a sort of peacock's tail, not necessarily of any survival value, originally, but needed to impress a potential mate. It could have been used to carry a tune.

Others have suggested language originated between mothers and children, growing out of lullabies. The two are not incompatible.


> One theory has been that language runs on a bit of brain hypertrophied as a sort of peacock's tail

> Others have suggested language originated between mothers and children, growing out of lullabies.

Sources?

——-

Separately, is language unique to humans? Are there examples of convergent evolution (semblances of grammatical structure) that can give us more clues as to how language may have evolved in humans?


Grammatical language, e.g. with subclauses, does appear from evidence thus far to be unique to humans, although whalesong is complex enough that it might yet be found there.

I gather that, since the "universal grammatical structure" was promoted, languages have been discovered that are said to lack any such forms. "Said to", because very few people are reporting on them.

Sorry, for sources I would be doing the same DDGing you will.


I’m not an electrical engineer, but aren’t there a couple of core features that all electricity has, like ohms law? Surely we can figure out how an undocumented old computer functioned from a couple of electronic components, if we identify which parts are the resistors and which are the transistors.


There was some interesting insights they've made on the Indus Valley script here: https://youtu.be/a_-obTZO6pY


That’s more like statistical inference, applying insights from known languages to it. It’s not really ML. Also, it can give you some broad clues, but decipherment is still a massive leap.


By and large, ML is statistical inference albeit in less-than-rigorous form.


There are plenty of ML models that are trained unsupervised on text. What you would do next with your dead-language BERT, I don't know. But you could definitely make one.


One issue is that we don't have a lot of text, not even a megabyte of it(represented as unicode characters). So you could get a language model, but how could you judge its output? Maybe it would be really good at generating more similar text, but that text isn't probably super representative of things we would want to be able to read.


Are there any commonalities in the type of subject matter considered worthy of recording in written text?


Yes, but cultural understanding still trumps this, at least according to some experts, case in point, François Desset has recently deciphered a 4,400 year old Iranian writing system (Linear Elamite) and this is his take on it:

> FD: This was the jackpot – the same Elamite or Hatamtite text written on one artefact in cuneiform, and on the other in Linear Elamite. This is a classic knowledge-driven decipherment based on our capability to make connections between Cuneiform and Linear Elamite scripts inscriptions recording the same Hatamtite (or Elamite) language text.

> RK: An excellent point, especially in this day and age when many seek to replace knowledge with technology.

> FD: Certainly. I wish in this regard to emphasise that I have been asked a lot about computers, statistical data, etc. All hogwash! Knowledge, especially cultural knowledge and of the languages used at the time, some luck, and most important perseverance were essential.

https://www.thepostil.com/francois-desset-on-the-deciphermen...


I notice that something appears to have eaten a 'very' in the page title.


There's a maximum size cap on HN titles, which when combined with the mandate to not editorialize titles means that it's not uncommon to see words expressing vague magnitudes removed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: