Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the biggest problem with voice commands is that if the command doesn't work the first time, it would have been faster to simply input the command yourself. I'm not confident that "get me directions to <place>" will actually get me those directions. Because of that, I also use Siri exclusively to set alarms (which does work every time).


This is the #1 problem with anything based on voice recognition and natural language processing IMO. It has to work 99% of the time to overcome this issue, but it seems like the nature of the technologies ensure that it will only asymptotically approach this threshold, and is currently stuck at 75% reliability at best.

Plus, even if it did work, it's just a really inexact and inefficient way to do anything. You always end up trying to see through the abstraction layer (natural language in this case) to the much more well-defined hidden system underneath. It's like reverse engineering an expert system, as a UI paradigm.

Programming languages made to resemble natural language face a similar poison pill, AppleScript being a prime example.

I totally get the appeal of trying to make software understand people better instead of forcing people to understand software, but in practice it just ends up being one more abstraction layer users have to struggle to get past in order to unlock the functionality of the underlying system.


The funny thing is that voice recognition is no longer the problem. For years the reason you couldn't do voice command interfaces was that transcription didn't work, but today transcription works amazingly well. My commands are transcribed perfectly, even in my car when I'm doing 65 on the freeway with my phone stuck in a cupholder.

The problem now is Android randomly kills the "OK Google" background listening process, or it fails because it can't handle a handover between wifi and cell data, or it "can't open microphone" even though it just heard me say OK Google, or any one of many other Android problems rears its ugly head.

The reliability of voice transcription now is far better than the general reliability of Android on my Nexus 5X, and the Android team ought to feel pretty bad about that.


Exactly this - There is a bug where if your phone is locked, and in a position where it thinks it's in landscape, when you say 'OK Google' the phone will unlock in portrait, start listening, rotate to landscape and in the process kill the app listening to your voice, and then rotate back to portrait and lock itself again.

It's also incredibly easy to fall off the blessed path where you can interact solely with voice - when setting a reminder, for example, if it doesn't hear 'Yes' when it's expecting to it'll just sit there and you need to touch the screen to continue - defeating the whole point of voice interaction.

Google's voice search has incredible voice recognition and text to speech and can tap into an amazing amount of information through the knowledge graph, but they don't seem to be capable of fixing the basic bugs and UX issues preventing all that technology from actually being usable.


> It's also incredibly easy to fall off the blessed path where you can interact solely with voice...

I had this happen with Google Maps the other day. I was driving into San Francisco with the turn-by-turn directions when it said something like, "There is a faster route available in two miles. It will save 5 minutes. Tap 'Accept' to take this route."

So I had to fiddle with my phone in a hurry, in traffic (after all, traffic was why there was a faster route coming up), and hope I don't tap the wrong button or crash into anyone while looking for it.

Why couldn't I just have said "Yes! Please and thank you. Of course I want the faster route, why wouldn't I? Oh, sorry, I mean 'Accept'!"

The funny part was that the "faster route" was the way I usually go into that part of town anyway, but Maps had been sending me a different way because of congestion on the usually-faster route. Why did it even ask me to "accept" the faster route instead of just redirecting me the way it does automatically on most occasions?

Like the time I was heading south on 101 through Morgan Hill and Gilroy and Maps had me get off the freeway and take a side street for a couple of miles because traffic was stopped on the freeway due to a car fire. It didn't ask me to tap Accept then, it just gave me some very practical directions on the fly. That's how it should work.


I was recently trying to find why this happens. For me, sometimes the phone hears "Ok Google" but then just sits there as if there is no microphone, while being in portrait mode all the time, so, no screen rotation. Other times, as you said, being screen-locked makes it deaf. I have to manually unlock it to restart OK Google.

We're not suffering from too stupid AI, we're suffering from stupid app design and laziness when it comes to adding more variations to conversation. Just put a team of script writers to imagine thousands of replies, it is not that hard, people have been doing this kind of rudimentary chat bot scripts for decades now.

I'd like to be able to install an App and suddenly, new commands are available to me.


It's a touchscreen device with voice as an afterthought. Devices where voice is primary will be designed differently.

Car navigation systems would be a good place to start.


I have to disagree. Mostly your right, but I have a very specific fail case that happens every time with Siri. I can't call my wife by name. I have to always call her "wife", because it thinks I want to change my name. Now it's a name in the address book, but it never matches against it.

I can also use it dictate an email or search the web, and occasionally it fails (maybe 10% of the time?), but it almost never gets my wife's name.


Well, I wouldn't be surprised if Google's voice recognition is better than Apple's (which is powered by Nuance, right?). Also, recognition for search queries and voice actions tends to work better than completely unconstrained text transcription. Of course "perfect" is an exaggeration, but my point is that Google's voice query transcription has passed a threshold and it is no longer the least reliable part of the system.


Okay I'll bite.

What's your wife's name?


Something that starts with 'me'. As in: 'call miyuki' --> 'ok, I'll call you yuki'


Ding! Ding! Ding! You have found the fail case.

In a somewhat related note, I've noticed people search on some websites fail with Vietnamese names because surnames are interpreted as prepositions, and thus dropped as a stop word.


Cortana


More and more I get back my Moto G with the OK Google widget on because of something on TV triggered it. Meanwhile it rarely understand me when I want. Fun.


> It's like reverse engineering an expert system, as a UI paradigm.

The rate of adoption will skyrocket when we reach a certain threshold, people will see other people speaking to their phones and they will learn to form useful voice commands as well. It has a big social factor.


I also use "Call my wife" and "Text my wife" or "Tell my wife that ...". They work every time. Music playing is disappointing - if I want to open YouTube to play something, it can't.

I don't understand why they didn't at least add a few commands for YouTube. At least "play Band", "play Song by Band", "play Music-Genre", "play something like X", "I feel sad/happy/lonely/bored play some music".

If you want to understand the extent to which Siri is useful, you just have to look at the integration it has with the main apps. It's just Alarms, Calendar, Phone, SMS, Music, Weather and similar. It's not really groundbreaking AI, just a voiced command list to the main apps.

I am eagerly waiting for the moment when we could have more in-depth chats with our bots, but the current crop is dragging its feet when it comes even to simple commands sent to apps. Maybe they should ask the app creators to add hooks for voice commands, to make Siri more useful.

For example, if I can disgress a little, Google Now can't start playing the first video in the YouTube app on Android. If I am already in the YT app and say "OK Google, play the first track", it goes to web search instead and searches for "the first track" on the web! The same company made the chatbot, the app and the OS and they don't work well enough together. At least let users add new functions to the bot, if they can't be bothered to do it themselves.


>I don't understand why they didn't at least add a few commands for YouTube. At least "play Band", "play Song by Band", "play Music-Genre", "play something like X", "I feel sad/happy/lonely/bored play some music".

Because YouTube ain't a music player? [1] And besides they have their own music app.

[1] Seriously, even though some people use it as such, this is far from most. According to YouTube itself, the average user only spends 1 hour per month listening to music on YouTube -- consumption is mostly non-music videos.


> which does work every time

Except when it doesn't. I had number of times when siri would fail to set reminders. I do not have a need to set alarms that often, but people seem to have problems with it too: https://discussions.apple.com/thread/7280346?start=0&tstart=...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: