I'm surprised languages aren't more of a focus in the LLM hype. They're like if Rosetta Stone ads were true. They translate at state of the art levels, but you can also give and ask for context, and they're trained on native resources and culture. There hasn't been a jump in machine translation this big and fast, ever.
I'm applying this to Mandarin (link in profile), and while there might not be much publicity/hype, there's definitely been an avalanche of people doing the same thing.
Current problems to solve:
- even the "best" LLM (GPT-4) frequently generates explanations/grammar that are plain wrong. Even its "correct" output isn't quite native (not as bad as round-tripping a sentence through Google Translate, but it's just slightly off).
- LLMs from Chinese companies (Qwen/Baichuan/etc) are immensely better at producing natural Mandarin (but fall short in other respects, which is unsurprising because they're smaller). I haven't tried fine-tuning LLaMa-2 yet, but I've had good success fine-tuning Qwen.
- in my opinion, 90% of the market doesn't need open-ended conversation about random topics. They need structured content with a gradual progression and regular review. You can use LLMs to generate this (which is what I'm doing), but it's not like a random newbie student is going to be able to design this themselves.
Not saying any of these problems aren't solvable, just pointing out the work that still needs to be done.
For me, the most exciting prospect is automated grammar correction during spoken conversation. I've made things harder for myself because I wanted to keep everything on-device so users could be assured that if they purchase something, they'll have access to it forever[0]. The downside is that I can't (yet) practically deploy any of these cutting-edge LLMs at the edge so I'm kind of handicapped in what I can do.
[0] subject to iOS/Android forced upgrades, which I have no control over. It's all cross-platform though, so I'll make a macOS/Linux/Windows version available at some point.
I’m not sure if it reaches the level of hype, but in at least one country where English is not the dominant language—Japan, where I live—using LLMs for language learning is frequently mentioned in the press and elsewhere. Some educators are starting to use them in classes, too:
I agree, they are perfect for language learning. Reusing another project I put together a site as a weekend project mainly for myself to learn Dutch but decided to release it publicly for free (invite only) to see how others interact with it as it's almost free to run it and looks good in my portfolio. It's an audio chat app, it's not my idea, I've seen many of these but wanted to create one for my own needs.
It uses Chrome's speech to text API (webspeechapi) for input, then it sends it to ChatGPT 3.5 and I use Google's text to speech API to generate audio response along with the text. So practically I can talk to ChatGPT.
I also want to add a dictionary module where I can generate and add example sentences and images to a words and phrases.
I'm sure there are complete teams working on apps like this as it's such a straightforward use case.
You can try it, it's all Hungarian but you can translate the page. https://convo.hu/
Invite code: CONVO
They are actually much better than previous state of the art. For the couple dozen languages with enough representation in the training set, GPT-4 is by far the best translator you can get your hands on, even without the whole "ask for context" etc