Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The puzzles are interesting but I'm not sure how much they overlap with the field of linguistics in general. They seem better described as problems in computational linguistics.

In all the questions there is the implicit assumption that the grammar of a language is based on a set of rules that can be reproduced and reapplied. This is hardly the case with natural languages, which, rather than sets of rules, are more like vast sets of exceptions to very few rules. While tools such as statistical analysis have been succesfully applied to analyzing languages, expecting linguistics to work like mathematics seems unnecessarily limiting.

To illustrate with an example, think of a similarly-constructed problem in English. Can you correctly deduce the missing words just by reproducing the patterns below?

ox - oxen / box - ?

mouse - mice / grouse - ?

dish - dishes / fish - ?

Separately, there's also the broader issue that languages are primarily what we speak, to which the writing conventions are secondary. In these examples, we're only looking at how things are written, which is yet another filter that gives an incomplete view.



I disagree. Field linguistics is all about deducing the underlying rules of a language from a language sample. Maybe those rules are very complex — perhaps even unpredictable, or irregular — but they can be described. The best reference grammars can be thousands of pages long in an attempt to describe all these rules (I always recommend [0] as a particularly good, open-access example). Based on the samples, the Linguistics Olympiad presents exactly the same sorts of problems, just in a reduced form.

> Separately, there's also the broader issue that languages are primarily what we speak, to which the writing conventions are secondary. In these examples, we're only looking at how things are written, which is yet another filter that gives an incomplete view.

This is wrong also. If you look carefully at the example questions from the Olympiad, you will see that those languages without a conventional orthography are written using the International Phonetic Alphabet [1], meaning that the texts are a direct transcription of spoken words, with little or no ‘filter’.

[0] https://langsci-press.org/catalog/book/295

[1] https://en.wikipedia.org/wiki/International_Phonetic_Alphabe...


> Maybe those rules are very complex — perhaps even unpredictable, or irregular — but they can be described.

Sounds very much like "exceptions" to me then.

>> In these examples, we're only looking at how things are written, which is yet another filter that gives an incomplete view.

> This is wrong also.

Does the linked PDF include any embedded audio material? If not, how could what I wrote be possibly wrong?

> If you look carefully at the example questions from the Olympiad, you will see that those languages without a conventional orthography are written using the International Phonetic Alphabet

I looked carefully at the example questions before posting my original comment already, and I only saw the IPA being used in a single footnote in one place (page 11).

Have we been looking at the same source material?

https://ioling.org/booklets/samples.en.pdf


By and large, languages have rules, and exceptions are--well, the exception. Exceptions tend to occur more often in common words (perhaps because children don't learn the exceptions in rare words, unless they're drilled on them--which we do on English in school). One question though is where the boundary is between rule and exception. In Spanish, various subsets of verbs undergo stem allomorphy. This can generally be described as rules (you do have to know which words undergo those rules), but it's unclear where the rules stop being rules and start being exceptions.

Also, while there are often exceptions in morphology, exceptions are almost non-existent in syntax. Again, much depends on where (or whether) there is a distinction between rules and exceptions. In English, unlike some languages, adpositions go before the NP (hence their name, prepositions). But there is at least one English adposition that follows the NP, namely 'ago' (hence it is a postposition). Is this an exception? Depends on your notion of "rule".

Also BTW, the line between spoken and written language isn't really a question of writing, odd as that may seem. Transcribed speech is still an oral language, distinct in many ways from written speech (e.g. written speech, even in newly written languages, tends to be syntactically and often lexically more complex than oral speech in the same language).


A lot of Chemistry and Biology is full of exceptions (some might even say that exceptions are the only rule in chemistry) however people seem to have no qualms with trying to reason through those subjects.


> A lot of Chemistry and Biology is full of exceptions (some might even say that exceptions are the only rule in chemistry)

What I meant by "exceptions" in my previous comment: if grammar books have to be "thousands of pages long," then at some point they're no longer listing rules but exceptions to them. I wouldn't expect this statement to even be controversial but since apparently it is, all the more it's worth clarifying.

Is the conjugation of the verb "to be" an exception, or is it covered by a very specific rule that only applies to that particular verb? I guess in the end it could be a matter of definition. But if this is supposed to be one of those "very complex — perhaps even unpredictable, or irregular [rules]," as the grandparent would apparently have it, then I have to question what his definition of an "exception" would be. Are there any exceptions to grammar rules at all?

As a matter of fact I agree with you. In fact, if you look at my original comment, I've been saying something similar all along. Philosophically, what we consider intrinsic rules of nature is often an approximation introduced to gain insight by reducing complexity. However, the discussion was in the context of linguistics, and beyond that I don't see any obvious parallel to the example from natural sciences you're bringing up.


"if grammar books have to be "thousands of pages long," then at some point they're no longer listing rules but exceptions to them." I've never seen a grammar book that was thousands of pages long, and I've looked at lots of them. The longest ones I know (for English, naturally) are still well under 1000 pages.

I have co-authored one (the Cubeo language) and edited or typeset lots of others, some of them running into the hundreds of pages. And the reason they're that long has nothing to do with exceptions, it has to do with verbal explanations for the reader, examples, excursions on semantics and pragmatics, citations, discussions of alternative analyses, and so forth. If you boiled it all down to rules--phrase structure rules, say, and morphological rules for languages that enjoy morphology, plus tables of exceptional paradigms (probably the only real exceptions), you'd probably end up with grammars on the order of 20 pages.

And fwiw, I don't think any linguist would consider the paradigm of English "to be" to be even remotely regular.


Thank you for your comments, it's been a pleasure to read all of them.

For the record, the quoted parts were taken from the post I was originally responding to: https://news.ycombinator.com/item?id=30023467

Also, I definitely agree (quoting from your other post) that:

> The line between spoken and written language isn't really a question of writing. Transcribed speech is still an oral language.

This is a very good point. Perhaps I shouldn't have written so categorically about this in my original comment. I only had in mind that due to the way some of the examples in the PDF are constructed, the focus seems to be more on the writing conventions than on the languages itself.


> Is the conjugation of the verb "to be" an exception, or is it covered by a very specific rule that only applies to that particular verb? I guess in the end it could be a matter of definition. But if this is supposed to be one of those "very complex — perhaps even unpredictable, or irregular [rules]," as the grandparent would apparently have it, then I have to question what his definition of an "exception" would be. Are there any exceptions to grammar rules at all?

My point was more that exceptions are rules too. ‘The verb be is conjugated like X, go like Y, and other verbs like Z’ is still a rule, albeit a complicated one, and one which could alternately be rephrased in terms of ‘exceptions’, insofar as that would be at all useful. Truly irregular, unpredictable exceptions — such as are so common in chemistry — are in fact rather rare and restricted in human languages.


Have you learned a new language? It's basically an exercise in memorizing vocabulary and phrases and pattern recognition. Linguistics is the study of those patterns across different languages. Therefore, I don't think there is anything wrong with problems focused on pattern recognition. I also think you are overstating the amount languages break their own rules. The only reason that your list makes sense is because there is such a clear pattern for the vast majority of plural nouns.


> Linguistics is the study of those patterns across different languages.

Linguistics is the study of languages in a scientific manner. Pattern recognition can be one approach, and it's great when it works to provide insight but it is (or should be) a means to an end and one must also be aware of its limitations.

> The only reason that your list makes sense is because there is such a clear pattern for the vast majority of plural nouns.

This perhaps could be a somewhat valid point if we were only talking about English. However, examples from languages with more complicated conjugation or declension could readily provide much better illustration why any naively-reconstructed rules reproduced from just a couple of hand-picked examples should not be assumed to hold.

So, as I said, while I find those puzzles interesting, I just don't think there's much linguistic insight to it. It's just an exercise in deductive logic. Nothing wrong with it of course, and I concede that might have been the point all along, just call me surprised.

Similarly: "Bob is twice as old as Alice and was 4 when the first man landed on the moon. How old is Alice?" is not a problem in astronomy. It's just a fancy way of stating: "b == 2a && b - ($CURRENT_YEAR - YEAR_OF_FIRST_MOON_LANDING) == 4, solve for a."


"Linguistics is the study of those patterns across different languages." vs. "Linguistics is the study of languages in a scientific manner." I take the first answer to be a bit different: I don't think the emphasis is on "patterns", rather it's saying that linguistics is about looking for ways that all languages are similar (like the claim that they can all be described by a context free grammar, or that apart from full word reduplication, the morphology of all languages is finite state). The alternative--your answer--is valid, although linguists of the first sort (I'm thinking of many generative linguists) look down their noses at it.

"examples from languages with more complicated conjugation or declension could readily provide much better illustration why any naively-reconstructed rules reproduced from just a couple of hand-picked examples should not be assumed to hold": Agreed that you'll need at least one example of each conjugation or declension class. But since there aren't usually more than a few such productive classes, that's not too many examples. There are of course those languages that clearly violate this--there's an African language that seems to have bizarrely many pluralization classes, like hundreds IIRC.

FWIW, languages with agglutinating morphology (long sequences of prefixes and/or suffixes) tend to be more regular than fusional languages (languages where each word takes at most one prefix or suffix, at least from what I've seen.


> In all the questions there is the implicit assumption that the grammar of a language is based on a set of rules that can be reproduced and reapplied. This is hardly the case with natural languages, which, rather than sets of rules, are more like vast sets of exceptions to very few rules.

Even if the exceptions outnumber the rules, each rule applies in a much larger number of cases, so each case is more likely to follow the rule rather than being an exception.

And linguistics as a field is really about the regularity of languages. When exceptions are studied, it is to discover the underlying rules that e.g. cause exceptions to arise or to disappear.

As a competition, the IOL additionally has the constraint that it should be solvable in a reasonable timeframe, so rather than making people wade through mountains of incomplete and contradictory data, you get carefully selected problems that require less effort to solve.


> Even if the exceptions outnumber the rules, each rule applies in a much larger number of cases, so each case is more likely to follow the rule rather than being an exception.

Perhaps, if you're looking at a language in its entirety, without regard to usage frequency of particular utterances.

However, most of the exceptions tend to be concentrated in the most frequently-used portions of the language. So, while the statement is technically correct, extensive focus on the rules is not really practical.

Take a random verb in English. It's reasonable to assume the third-person singular form can be constructed by appending an "-s" to it. Similarly, the past-tense forms can be constructed with an "-ed." Most verbs are like that.

Yet this rule isn't of much help with the two most common verbs: "to be" (am/are/is/was/were, not bes/beed), and "to have" (has/had, not haves/haved), as well as dozens of others that also happen to be among the most frequently-used.

Thus, from any practical point of view (such as a language learner's), it's best not to expect any rules to hold in principle, at least not until one's awareness of the exceptions is sufficiently advanced.

> And linguistics as a field is really about the regularity of languages. When exceptions are studied, it is to discover the underlying rules that e.g. cause exceptions to arise or to disappear.

I think this is a very good point. I'm not criticizing the questions as they are, as I said I found them interesting. All I'm saying is that it's good to be aware of the limitations of such an approach. It's best to look at some of the problems as logical puzzles, only wrapped in references to some (more or less) obscure languages, and natural languages should not be expected to follow the principles of logic.


I think that's the wrong take. The more frequent utterances can be fossilized precisely because of their repeated use. The reality is that any creative or less frequent use of language (thus carrying more information) is carefully constructed in regular ways whose rules can be discovered.


Indeed, in mathematics/physics the analytically unsolvable problems also vastly outnumber the tractable ones, yet in a competition you always get questions with nice solutions.


Discovering these rules and their exceptions have been what field linguists, and philologists before them like Champollion (who should be more widely known imo -- he was actually doing science to decipher hieroglyphics; a generation or two before, decipherments were largely alchemistic gobbledygook) as well and the westerners and Indians who worked on the old languages India had been doing for well over a hundred years.

These field linguists (Imperialistic Europeans, those under the colonial yoke, and disinterested, merely inquisitive parties) produced grammars noting rules, which really do outnumber the exceptions in any given dialect at a given time (that's important), and exceptions. This work has led to everything from a tighter grasp on colonial possessions, to the enhanced ability of colonized to resist their colonizers, to the decipherment of forgotten, thousand year-old and the recording of near-dead languages.

And regarding your last comment, in many cases the languages we're dealing with have no writing, so I do agree. A better Olympiad would have at least included a aural-only portion of the exam. I right there with you on this one.

Really, maybe we're agreeing more than disagreeing, because I also support your comment that "expecting linguistics to work like mathematics seems unnecessarily limiting". Mathematics doesn't change; no same person steps in the same linguistic twice. That was the base of the linguistic program for most of the 2nd half of the 20th century (this was also computational linguistics before it took it's rule-based -to-statistical turn), and it produced insights and tools for the field linguists, mainly to decipher morphosyntax. Yet, I'd say the BIG discoveries, like the decipherment of Maya, have come from that muddy, uncomfortable, dangerous field work... gathering evidence for regularities among the glyphs that could be painstakingly comparing those amongst themselves and with the spoken languages of the region today. Some rules have stayed very similar for a long time, and I invite you to look at the historical recreations of proto-languages to get a sense of not only the regularities of a given modern language, but the regularities in the changes of languages over >1000 years.

That being said, (statistics-borne) Computational Linguistics is a wonderful (and a little scary thing), and I'm very willing to change my mind. It certainly challenges the rule-based assumptions and just maybe we're headed towards another paradigm shift.


"in many cases the languages we're dealing with have no writing, so I do agree. A better Olympiad would have at least included a aural-only portion of the exam." Why? A previously unwritten spoken language can always be written with IPA, and giving the test orally would really be a test of people's recognition of sounds they may not have been exposed to even in their phonetics class.

Sign languages are different--there isn't really an IPA-like system for previously unwritten sign languages.


These are also selected problems, and they start with easy problems. Those are pulled from very regular parts of languages. If you have a part of a language with no pattern to it, then nobody is going to make it a problem. But it’s good practice, especially as the problems get harder, for finding regularity where it’s difficult to discern.


Keep in mind that English, as a lingua franca, is one of the most heavily exceptioned languages in the world.


Based on discussions with friends who have been medalists from International Olympiads more than once, there is no need of any real linguistic knowledge.

General knowledge of language groups may be helpful, but often the problems were for some language used from <1k people. And usully you wouldn't have heard them at all, or would not have additional information where it is spoken.


The puzzles look to me like some of the homework problems I had in my Syntax & Morphology class, so while they don't cover all of linguistics, they certainly cover an interesting subset.


> The puzzles are interesting but I'm not sure how much they overlap with the field of linguistics in general

Same thing with programming Olympiads, but it isn't necessarily a bad thing


Both "boxen" and "fishes" are very well attested in real English usage.


The noun "boxen" is only used as a joke ("Unix boxen"), which is precisely why I thought it would make a cromulent example.

And you're of course correct that "fishes" exists as a plural form too.


It's part of a strange sociolect, but that doesn't invalidate that it's well attested enough so that the meaning is obvious to any English speaker.


I'm not sure if it's really well-understood beyond specific social circles. You would have to substantiate this claim.

Even if it were, just because "the meaning of something is obvious," it doesn't follow that the words are standard or even well-formed.

Native speakers of any language, and speakers of English likely even more so, generally have the ability to parse through malformed utterances of all sorts and recover most, if not all of the meaning, aoccdrnig to a rscheearch at Cmabrigde Uinervtisy. [1]

Besides, the meaning of many things is obvious even without the ability to comprehend the language. If I kick you in the ass and you start shouting at me in a foreign language I don't understand, I posit that the meaning of it would still be fairly obvious. We can even verify this experimentally if you are so inclined. Count me in for the test, I'd be glad to contribute my part for the advancement of science.

Regardless, any of the above does not invalidate my point that you cannot succesfully construct the standard plural "boxes" from "box" by following the pattern of "oxen" from "ox", and no amount of pedantry and nitpicking can change that.

However, even if you exclude two of the three examples I provided, then the remaining one still stands (and many more can be provided obviously). So I'm not really sure what is it exactly that you're trying to argue here. If it's just that my simple examples did not live up to your expectations, then I concede, and let's move on.

1. https://www.mrc-cbu.cam.ac.uk/people/matt.davis/cmabridge/


> Even if it were, just because "the meaning of something is obvious," it doesn't follow that the words are standard or even well-formed.

That's literally the linguistics definition of "standard or well-formed", so no.

> So I'm not really sure what is it exactly that you're trying to argue here.

Linguistics is a science, there are mathematical models that natural languages adhere to. Regardless of the social trappings around "standard"/acceptable speech, language really does follow mathematical laws. That's why these puzzles are meaningful and not just diversions for fun.


What's your point though? The OP was demonstrating that constructing plurals based on the given pattern does not hold generally. If you're stuck on boxen, then choose some others:

man -> men, pan -> ?


Great example, much better than anything I came up with.


In a way that proves the point as well though - not only are there inconsistent rules there are also seemingly under-determined rules where you can use one of many options (but only sometimes!)


This person linguists!


You can't verb just any noun.


> They seem better described as problems in computational linguistics.

A linguist can deduce that, because that is written on the sample-problems document.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: