Why are (almost) all these writing systems from Crete? Were people not writing things down on the mainland at that time, or have their writing systems all been deciphered?
Or is there some other explanation?
Not a historian, but the obvious explanation is that Crete was the centre of the Minoan civilisation, ‘the first advanced civilisation in Europe’ [0]. The only other scripts in use at the time were various forms of cuneiform in Mesopotamia and Anatolia (for Sumerian, Akkadian, Hittite etc.) and hieroglyphics for Egyptian, all of which have been deciphered. We have far more evidence for those scripts than we do for the various Aegean ones.
Makes sense. Another dimension to this is that Minoan language does not appear to have had any resemblance to any form of Greek. The most ancient cultural artifacts from Mycenae seem to be of Minoan inspiration while Crete itself seems to have had intrusive Mycenean characteristics until it was completely supplanted by the Mycenean.
Asking as a total layperson: have any automated machine translation techniques been applied to undeciphered scripts such as the ones discussed in this article? If so, what was the outcome?
Just what would the models be trained on? Machine learning requires you to have a corpus of mappings between peices of texts in two languages, each of which have been established to be close in meaning -> thereby requiring there to be no decipherment problem withstanding beforehand.
There are a series of problems in decipherment of Minoan scripts from what I understand:
1. The Linear A & Cretan script are undeciphered in the sense that we do not know what phonetics the symbols stand for
2. The Minoan language hypothesized to be written in these scripts is undeciphered in terms of vocabulary (given we do not know the phonetics of the letter symbols) & by extension its semantics. At best a kind of crude syntax can proposed for it based on patterns of the word-symbols in texts of the hypothesized language written in those scripts.
> Just what would the models be trained on? Machine learning requires you to have a corpus of mappings between peices of texts in two languages, each of which have been established to be close in meaning -> thereby requiring there to be no decipherment problem withstanding beforehand.
I was thinking of unsupervised machine translation specifically[1].
Unsupervised machine translation works by distribution-matching embeddings on the corpora you want to translate between. If the corpora are large enough, their distributions can be estimated robustly, and if they have sufficient overlap in the topics they cover, it's likely that words with similar distribution have similar meanings.
So if there were a large amount of undeciphered Linear A inscriptions on a guessable range of topics, unsupervised machine translation might be worth a try.
Unfortunately, there aren't that many Linear A inscriptions, and for those where the kind of content was known, the distribution matching has already been carried out by hand. E.g. from the article: "the word AB81-02, or KU-RO if transliterated using Linear B sound-values, is one of the few words whose meaning we do know: it appears at the end of lists next to the sum of all the listed numerals, and so clearly means ‘total’. But we still don’t actually know how to pronounce this word, or what part of speech it is, and we can’t identify it with any similar words in any known languages."
All human languages have some commonalities. I remember this headline from 2016, when Google had an AI model that trained on some languages was able to somewhat successfully translate between language pairs it had never seen before:
I'm not a linguist and haven't done much machine learning, so forgive my naïveté, but aren't there a couple of core features/structures all languages have? That might not be enough to learn anything/might be what you're saying, but I wonder if there's a way to determine what a plausible grammar is and to try to identify what the nouns are. One you have the nouns, you could try just doing a huge substitution game on different texts in that language and try to see what guesses make the most sense for the most sentences.
> aren't there a couple of core features/structures all languages have?
The answer is 'no', especially in the sense that there's a relatively finite set of possible grammars that a putative natural language could be compared against. In terms of basic parts of speech, I believe that every language does have something that you can describe as a noun, but that's more or less the only "universal" part of speech (there are some languages that essentially don't have verbs--you "do a look" rather than "see", e.g.).
The more serious problem, I think, is that the corpus of Linear A is simply too tiny to do any serious study, and I don't know how well the written corpus is at actually reflecting problems like segmenting text into words or morphemes. In essence, the available evidence is so paltry that you could justify just about any grammatical hypothesis, I suspect.
If I'm understanding Chomskyian linguistics correctly (that's a really big if), there was originally thought to be an inherent "language organ" that strongly controlled grammar. But over time, and as linguistics documented more languages, the things that are universal in grammars in this subdiscipline has essentially been reduced to 'merge', which is an abstract concept that I'm pretty sure I don't understand.
Recursive nesting still stands as universal. Pretty much every language has a form of "he said {she said { stuff }}". One guy claims to have found a language deep in the Amazon without this feature, but no one thinks he's credible.
Pretty much every language does verbs connecting subject/object. In English it's S-V-O. Treating addition as a verb, our sentences work kind of like X+Y. Other languages use something like reverse polish.
The original Chomsky idea was that there was an underlying brain structure reflected in language and only a small number of tunable parameters defined the whole space. This hypothesis was meant to explain how humans learn language so quickly. The idea was, much like horses that start galloping just after leaving the room, maybe we're born already sort of knowing it.
The theory hasn't panned out so far. There's too much similarity between unrelated languages to pretend like some universal mechanism isn't behind them, but for nearly any particular lingustic feature you can usually find at least one obscure example which shows it isn't universal. The two features I've listed are the exception.
There are 6 orders of S, V, and O. All are attested, but the orders where O precedes S are vastly less common than those where S precedes O.
The counter-argument to Chomsky's Universal Grammar is that commonalities across language exist because they are all solving similar problems for which there are more or less optimal solutions. In essence, convergent evolution produces similarity. And one might argue that recursive structure isn't a feature but rather the absence of a feature: something to prevent recursion. If you have a rule that says you can make category X out of pieces which may also be of category X, then you have recursion. So using a noun as a modifier of another noun -- "lunch counter" -- produces recursion.
> I believe that every language does have something that you can describe as a noun, but that's more or less the only "universal" part of speech (there are some languages that essentially don't have verbs--you "do a look" rather than "see", e.g.).
R.M.W. Dixon argues convincingly that every language has at least nouns and verbs; less convincingly (IMO), he also argues for adjectives as a universal category. Some languages have only a very small set of verbs — Kalam has ~200, and Wutung has only 32 [0] — but they still have an identifiable category of verbs nonetheless. (Jingulu has been said to have 3 verbs, but only if you disqualify the large set of non-inflecting verbs.)
"There was originally thought" meaning "Chomsky thought". But it was all obvious bollocks, contrary to elementary natural selection.
Grammars necessarily have to be compatible with brain organization inherited from our primate ancestors, who obviously could have had no "language organ" carried about waiting to find some sort of use by their future descendants. Brain structures all need to be immediately useful for surviving or reproducing.
One theory has been that language runs on a bit of brain hypertrophied as a sort of peacock's tail, not necessarily of any survival value, originally, but needed to impress a potential mate. It could have been used to carry a tune.
Others have suggested language originated between mothers and children, growing out of lullabies. The two are not incompatible.
> One theory has been that language runs on a bit of brain hypertrophied as a sort of peacock's tail
> Others have suggested language originated between mothers and children, growing out of lullabies.
Sources?
——-
Separately, is language unique to humans? Are there examples of convergent evolution (semblances of grammatical structure) that can give us more clues as to how language may have evolved in humans?
Grammatical language, e.g. with subclauses, does appear from evidence thus far to be unique to humans, although whalesong is complex enough that it might yet be found there.
I gather that, since the "universal grammatical structure" was promoted, languages have been discovered that are said to lack any such forms. "Said to", because very few people are reporting on them.
Sorry, for sources I would be doing the same DDGing you will.
I’m not an electrical engineer, but aren’t there a couple of core features that all electricity has, like ohms law? Surely we can figure out how an undocumented old computer functioned from a couple of electronic components, if we identify which parts are the resistors and which are the transistors.
That’s more like statistical inference, applying insights from known languages to it. It’s not really ML. Also, it can give you some broad clues, but decipherment is still a massive leap.
There are plenty of ML models that are trained unsupervised on text. What you would do next with your dead-language BERT, I don't know. But you could definitely make one.
One issue is that we don't have a lot of text, not even a megabyte of it(represented as unicode characters). So you could get a language model, but how could you judge its output? Maybe it would be really good at generating more similar text, but that text isn't probably super representative of things we would want to be able to read.
Yes, but cultural understanding still trumps this, at least according to some experts, case in point, François Desset has recently deciphered a 4,400 year old Iranian writing system (Linear Elamite) and this is his take on it:
> FD: This was the jackpot – the same Elamite or Hatamtite text written on one artefact in cuneiform, and on the other in Linear Elamite. This is a classic knowledge-driven decipherment based on our capability to make connections between Cuneiform and Linear Elamite scripts inscriptions recording the same Hatamtite (or Elamite) language text.
> RK: An excellent point, especially in this day and age when many seek to replace knowledge with technology.
> FD: Certainly. I wish in this regard to emphasise that I have been asked a lot about computers, statistical data, etc. All hogwash! Knowledge, especially cultural knowledge and of the languages used at the time, some luck, and most important perseverance were essential.
There's a maximum size cap on HN titles, which when combined with the mandate to not editorialize titles means that it's not uncommon to see words expressing vague magnitudes removed.
I see it's in kindle. I must get it, can't find my paperback.
https://www.amazon.com/Decipherment-Linear-Canto-Classics-eb...