I'm doing a lot voice searches recently. It's just easier and feels more natural. It's amazing how accurate Google voice recognition is. I'm non-native English speakers and sometimes I'm not even sure if I pronounced something correct but it gets it!
I've used it to figure out the spelling of a word I couldn't understand.
There's a reporter on NPR who sounds like he always introduces himself as "han zhi lu wong". I say that to google voice search, and it corrects it to "Hansi Lo Wang".
So google voice recognition works even if you don't know what words you're saying!
Well, sure, that's precisely how speech transcription works. When you say the words out loud you aren't speaking in letters, you're speaking phonetic sequences. It's a recognizer's job to decode the best word sequence spelling (from its list of known word spellings and their pronunciations) from the input pronunciation.
My point is, it's amazing that it's better at decoding phonetic sequences (which are presumably garbled by passing through a human being (me) who doesn't understand them), than a human being who has evolved to use language and has over 30 years of fluent experience.
Any language that uses the latin alphabet has different rules for how pronunciation is derived from or encoded with the letters.
e.g. with your example, the letter pair ‘si’ is pronounced differently in Mandarin that it is in English. So it's not surprising that you couldn't write it down properly in pinyin as you don't know how to transcribe Mandarin into pinyin. But Google does.
Another example - without knowing how French spelling works there is no way that as an English speaker you could work out how to correctly spell ‘peut’ (can) just from hearing it.
That's really cool, but I don't know if I'd describe that example as being better at decoding phonetic symbols. You might still be better better at mapping phonetic symbols to words and names you know -- which I think is a separate skill than mapping them to new words or names -- it's just that you didn't know that name and the software did. And it might also be better at knowing lots of words, which is yet another skill/property.
If you both have learned the word, who is better at recognizing it? And who is better at learning a new word they don't know? These aren't very exacting questions because the comparison between human and machine knowing and learning hasn't been defined. Maybe the machine needs more examples and more contexts to learn a word as well as a human and correctly map to the word as often as a human, but it can also examine more examples and contexts than a human can per unit time and can do a lot of scaling in this regard using increased energy that a human cannot.
Except that his phonetic sequence probably wasn't entirely correct. We don't repeat what we hear, we repeat what we think we hear, after we put our native tongue's filter over it to try to make sense of it. So what he said to Google voice probably wasn't completely accurate, but it still knew what he was talking about.
Pretty Google taps into its gigantic web index to find senseful terms. I played with google translate by typing DragonBall Z names phonetically, Google suggested actual names at the top of the list. I can't help but thinking it cross reference queries, trends and such.
> Imagine being implicated for a crime because Google thought you said something, but you said something differently.
That seems really unlikely to ever happen.
Note that by default Google stores the audio of your searches - including the few seconds before you said "ok google" - indefinitely: https://history.google.com/history/audio