Voice assistants are basically just mainstream non-visual command-lines, and it'...

qsort · on Nov 23, 2022

The problem isn't voice, it's natural language.

Natural language is a fundamentally wrong vehicle to convey information to a computer. It can be useful for some specific tasks, automated Q/A, simple interfaces to databases, stuff where I can't be properly f_ed to remember the syntax or the shortcut like IDE commands.

But the idea it can replace formal language is fundamentally and dangerously incorrect. I agree with Dijkstra's quip, we shouldn't regard formal language as a burden, but rather as a privilege.

bombcar · on Nov 23, 2022

I'd be perfectly happy with a list of Siri commands that I would have to learn to be able to do things. I don't care if I ended up sounding like:

Hey Siri

Turn lights on 50 percent

For one hour

Dim over that time

Play music.

I can learn what I need to do; JUST LET ME KNOW THE MAGIC WORDS!

LooseMarmoset · on Nov 23, 2022

It's like playing Zork all over again.

guestbest · on Nov 23, 2022

A lisp compiler in a voice assistant would seem like an improvement in that the user could define objects and then express the actions to be performed in the same room. But these assistants seem to drop objects between commands making them hard to program conversationally.

I guess a list like language would be ideal and the pauses would be like parentheses

toxik · on Nov 23, 2022

But with the added complexity that sometimes the speech-to-text will just crap out completely.

LooseMarmoset · on Nov 23, 2022

Alexa, turn on lights

...I don't know how to do that

Alexa, turn lights on

...What do I turn the lights with?

Alexa, activate lights

...I don't know what you mean

...It is pitch black. You are likely to be eaten by a grue.

ALEXA TURN ON THE DAMN LIGHTS

...I don't know the word "lights"

...Oh no! You have walked into the slavering fangs of a grue!

** You have died **

bombcar · on Nov 23, 2022

Siri, turn on bathroom lights.

Downstairs or upstairs bathroom?

Downstairs.

Sorry, I didn’t understand. Downstairs it upstairs bathroom?

Downstairs bathroom.

Sorry, I didn’t understand. Downstairs it upstairs bathroom?

Cancel.

Ok. Cancelling.

Siri turn on downstairs bathroom lights.

(Turns off all lights)

gernb · on Nov 23, 2022

For me, about once a week it's

"hey siri?"

(no response, no icon),

"hey siri?"

(no response, no icon),

"hey siri?" (louder)

(no response, no icon),

"hey siri?" (louder and slower)

(no response, no icon),

reboot iphone 13 pro

"hey siri?"

works

spookthesunset · on Nov 23, 2022

“Did you mean ‘bathroom LED’ or ‘bathroom’?”

Because god help you if your device names are similar to your room names…

bombcar · on Nov 23, 2022

I’ve taken to naming my lights things like Greg, The Beacons, etc.

And I added scenes so I can say “Gondor calls for aid” and the beacons will light.

ghaff · on Nov 23, 2022

Yes. And it may be worth noting that Zork is literally something like 50 year old parser technology.

ics · on Nov 23, 2022

Not to take away from your point (I'd like the magic list too) but to some degree, this can be worked around using Shortcuts. If you use inputs, Siri will prompt for them which is a bit slow but you could even use a dictate text and parse yourself if desired.

everdrive · on Nov 23, 2022

I highly doubt there is "a" magic list. I'll bet the magic list changes constantly.

bombcar · on Nov 23, 2022

I noticed a drop in usability about the time they went with ML.

ASalazarMX · on Nov 23, 2022

Same with the predictive keyboard, it feels more random now.

wkdneidbwf · on Nov 23, 2022

i don’t know that you can do exactly all these things, but is this the use case for custom routines in the amazon ecosystem.

you great the prompt and add one or more actions to take.

albertzeyer · on Nov 23, 2022

On the other side, humans have been fine using natural language to delegate commands to each other.

So maybe it's just that the subfield of natural language understanding is still too early to be really useful. Speech recognition itself has gotten really good but then understanding the context, the intent, etc, all that is natural language understanding, and that is often the problem.

moffkalast · on Nov 23, 2022

> have been fine

Citation needed, there's a lot of disagreements and misunderstandings (some have cost lives) that could've been avoided if we didn't have 10 different ways to say the same vague thing that can be interpreted in 20 ways. You think the military uses a phonetic alphabet and specifically structured communications for fun? Or the way planes talk to ATC for example. Where precision and unambiguity is crucial, natural language always gets ditched for something more formal.

_jpys · on Nov 23, 2022

This is actually an interesting point. In the Army, we used terms that limited ambiguity thereby increasing efficiency. Even if one eliminates the complexity of language, there's still a specification problem.

I only use voice assistants to set alarms. I cannot imagine voice as a primary input. Then again, many have opted out of owning desktops and laptops in favor of mobile phones. That also seems terribly inefficient.

ghaff · on Nov 23, 2022

>Then again, many have opted out of owning desktops and laptops in favor of mobile phones. That also seems terribly inefficient

A lot of people don't need computers in the general purpose sense. I admit my mind boggles a bit when co-workers tell me their kids don't want a computer to do their school papers because their phone is fine. But, then, I'm used to keyboards and what we think of as a "computer" and have been using one for decades--and grab one when I can for any remotely complex or input-heavy task.

em500 · on Nov 23, 2022

> A lot of people don't need computers in the general purpose sense. I admit my mind boggles a bit when co-workers tell me their kids don't want a computer to do their school papers because their phone is fine.

I grew up in the 1980s, when handwritten papers were still the norm. I do see the advantages of using a word-processor for writing papers, but don't see why it would be a necessity (at least, until University).

icapybara · on Nov 23, 2022

I think the implication is that the kids use a word processor on their phone.

moffkalast · on Nov 23, 2022

It sounds ridiculous, but I'll admit that when you've got something like Dex that lets you dock the phone for usb and hdmi out and gives you close to a full desktop OS I'd imagine it really is enough for the casual user.

ghaff · on Nov 23, 2022

I certainly know colleagues in the industry who travel with just a tablet and external keyboard. No, they're not running IDEs etc., but they find it OK for emails, editing docs, taking notes, etc. Personally I'll spend the extra few pounds to also carry along a laptop. But I can imagine not needing/wanting a dedicated laptop when I travel at some point.

iso1631 · on Nov 23, 2022

Is a tablet and keyboard really much lighter than a laptop?

https://www.theverge.com/2020/4/20/21227741/apple-ipad-pro-m...

Suggests a keyboard and large tablet is heavier than a laptop

ghaff · on Nov 23, 2022

I'm usually carrying a tablet anyway though for entertainment/reading purposes. So it's usually a choice of tablet + laptop vs. tablet + keyboard. (I admittedly don't really have a weight optimized travel laptop these days either.)

I actually do wish there were good Mac or Chromebook choices for a travel 11" or so laptop but the market seems to have settled on a thin 13" as the floor and, admittedly, the weight/size difference isn't huge.

mark_l_watson · on Nov 23, 2022

While I am mostly a Mac person, for travel I often prefer a tiny and cheap Lenovo Chromebook that does everything (a bit poorly): Linux containers for light weight programming and writing, consume media like books, audiobooks, and streaming.

In response to a grandparent comment about weight for tablets: I prefer Apple’s folio old style of cases/keyboards because of weight. I have one for both my small and large iPad Pros. Whenever I travel, I usually just take one of my iPads if I don’t need a dev environment [1].

[1] but with GitHub Codespaces and Google Colab, development on an iPad is sort of OK.

moffkalast · on Nov 23, 2022

I still don't see the point of tablets. It's just a smartphone with a larger screen, and practically all people already carry phones.

Might as well go for the laptop at that point given that it can actually do far more imo, unless you ditch the phone and go for one of those half phone half tablets I guess.

ghaff · on Nov 23, 2022

I'd rather watch movies, read, play certain games, etc. on my tablet than on a phone. (Obviously there are also specific use cases like digital art.) That said, I mostly use my tablet when traveling and it's a distant third in necessity compared to either a laptop or a phone--and only somewhat more useful than a smartwatch.

everdrive · on Nov 23, 2022

Watching movies on a tablet is terrible, though. All methods for propping the device up so you can watch the movie are inferior to the way a laptop screen props itself up via hinges and a base.

ghaff · on Nov 23, 2022

On a plane I'd rather use the tablet in my lap than have to put the tray table down. And in a hotel room I'm watching on the couch if there is one. (I do also have an attachment for my tablet that will let you prop it up on a table but I mostly don't use it because it adds weight.)

For reading, I'm probably bringing my Kindle along if I don't bring my tablet.

mod · on Nov 23, 2022

I bought a surface for that reason. I like the portability, and it is just a normal PC with a pretty bad keyboard.

brewtide · on Nov 26, 2022

If you do not have one, buy a dock! I have a sp6 and 4 , and having the dock makes it quite the device. Speakers, multiple external monitors, keyboard, mouse -- a full desktop setup, I can grab and either stick a keyboard cover on or just use as a reading device on the couch.

Back to work? Sit on table, one cable and it's back to a desktop and charging up again.

Makes the whole thing make far more sense.

pfdietz · on Nov 23, 2022

How old are you? Because larger screens become really nice as your eyes go bad. And I don't need the full size of a laptop for things I'd want to do on a tablet.

everdrive · on Nov 23, 2022

The obsession with being lighter definitely has diminishing returns. At some point another few ounces doesn't make any difference in a real, practical sense. I think have just started to associate "lightness" == "better" despite there being no actual benefit past a certain threshold.

galaxyLogic · on Nov 23, 2022

Right at some point. But at the current point my tablet is too heavy to hold in hand for more than 20 secs perhaps. Phone is ok. Tablet is not (for me). I only use tablet by placing it on table or a stand. Then actually using a laptop is much better than a table.

The killer-tech will be when we have a tablet that is as light as phone.

groestl · on Nov 23, 2022

Thanks for that. A lot of energy is currently sunk because of natural language, and I'd argue gains from employing software (instead of human processes) for various tasks is in part due to scaling up the results of many confusing discussions in natural language about what a specific process actually comprises.

b112 · on Nov 23, 2022

This is part of the reason Google search sucks more and more.

Around when Android appeared, and the first voice searches began, Google suddenly started to alias everything.

Search for 'Andy', 'Andrew' appears. Search for 'there', and 'they're' appears.

This has been taken further, now silly aliases such as debian .. ubuntu exist, and as google happily drops words in your search, to find a match, this makes precision impossible.

But, that's the only way to make voice search remotely work, so...

jefftk · on Nov 23, 2022

I don't think this is to support voice search: Google generally knows whether a query was initiated by voice or typing. Instead, I think it's because most users find what they're looking for faster with it.

If you have terms you don't want interpreted broadly you can put them in quotes.

Zach_the_Lizard · on Nov 23, 2022

Google "helpfully" ignores the quotes sometimes too. They're not the hard and fast rule they used to be.

I preached the Gospel of Google when the competition was composed of web rings and Altavista, but Google in its infinite wisdom has abandoned the advanced user with changes of this nature.

jvolkman · on Nov 23, 2022

Pretty sure quote support has improved recently.

https://blog.google/products/search/how-were-improving-searc...

b112 · on Nov 24, 2022

Considering the article lies, and tries to claim quotes always are respected, I wouldn't put much faith in it.

thfuran · on Nov 23, 2022

So what is the gospel de jour, or are we forsaken in these benighted times?

bluGill · on Nov 23, 2022

Most people are not precise enough in their terminology.

galaxyLogic · on Nov 23, 2022

I find voice-assistant often useful for using the phone such as opening a given setting, say make the display brighter. Trying to navigate the settings pages is very error-prone. There seems to be no universal standard as to where each setting should be found.

b112 · on Nov 24, 2022

The real problem is people keep reorganizing where the settings are found.

numpad0 · on Nov 23, 2022

There is a widely accepted and straightforward thinking that humans has ideas, which are expressed in languages, and that languages being ambiguous is problematic: this I'm starting to have doubts on.

Maybe we don't have clear intentions in the first place, maybe languages are not just ambiguous, but only meant to narrow realms of valid interpretations down to a desired precision, rather than intended to form a logically fully constrained statements. Maybe this is why intelligent entities are needed to "correctly" interpret natural language statements, because an act of interpretation itself is a decision making and an action.

Just my thoughts but I do think there are more to be said than "natural languages are ambiguous".

stubish · on Nov 23, 2022

> On the other side, humans have been fine using natural language to delegate commands to each other.

Using language to instruct humans goes wrong all the time. Just a short while ago on British Bakeoff I saw 2 of the contestants make white chocolate feathering on their biscuits by making actual feathers out of white chocolate and placing them on their biscuits. And I'm sure that will confuse quite a few people reading this too. It certainly confuses image searches. Language is a fuzzy interface. Compare to interface like clicking on a button that does the thing I want done.

Closi · on Nov 23, 2022

How would you (easily) describe the concept of chocolate feathering to a computer without using natural language? (e.g. if you wanted the computer to generate an image, or search for an image of / recipe with chocolate feathering).

missjellyfish · on Nov 23, 2022

> On the other side, humans have been fine using natural language to delegate commands to each other.

And that's why all of aviation has moved to a tight phraseology, such that delegated commands are universally understood and their meaning is set in stone.

Natural language has cost many lives.

denton-scratch · on Nov 23, 2022

> humans have been fine using natural language to delegate commands to each other.

Not always resulting in unambiguous instructions:

"Lord Raglan wishes the cavalry to advance rapidly to the front, follow the enemy, and try to prevent the enemy carrying away the guns." ~Lord Raglan, Balaclava

"I wish him to take Cemetery Hill if practicable." ~Robert E. Lee, Gettysburg

heavyset_go · on Nov 23, 2022

> On the other side, humans have been fine using natural language to delegate commands to each other.

On the other hand, legalese exists and is the lingua franca of telling people what to do, and math exists.

ska · on Nov 23, 2022

> On the other side, humans have been fine using natural language to delegate commands to each other.

I think this is really a characterization. Mostly human communication is full of errors and problems.

What is true is that when it is important enough, humans have come up with ways that minimize communication errors and frameworks to deal with ambiguity - mostly these involve training and effort though, it really doesn't come naturally.

ska · on Nov 23, 2022

"really a problematic characterization"...

marcosdumay · on Nov 23, 2022

> humans have been fine using natural language to delegate commands to each other.

Every time we try to minimize errors, we formalize a language. I don't even think people use natural language to issue commands often. Commanding people is often considered rude.

psadri · on Nov 23, 2022

I agree with this. We have evidence that natural language works well enough to run most of the world. AI will eventually get there.

pjc50 · on Nov 23, 2022

The problem is that it's not actually a conversation. To significantly improve it, you'd want to:

- identify users by voice

- ask them clarifying questions

- remember the answers on a per-user basis

- understand "no, that was the wrong answer"

If you're going to provide a formal interface to the computer, you also have to provide teaching in that formal interface, which is far more of a burden to the user than the cost of the device. And we've completely moved away from that model (not necessarily a good thing, but that's what the market has chosen).

enobrev · on Nov 23, 2022

Calling it a burden is an assumption that ignores and belittles the end user. Sure, there are people who won't want to train their personal ai.

But I imagine there are significantly more who would appreciate clarifying requests by a teachable assistant capable of interacting with the entire digital world on their behalf, efficiently and intelligently.

michaelbuckbee · on Nov 23, 2022

I think you're right. There are glimpses of this in the voice interfaces right now. For example, Alexa will distinguish between voices and preferentially take actions for me, saying "Play Music" plays Spotify, and for my kids, it plays Amazon music.

4b11b4 · on Nov 23, 2022

An example backing this is voice assistants that DO work, e.g. Talon voice. But these require defining a language, and then they are very accurate and powerful.

I don't see why a voice assistant for the masses couldn't "train it's own users", for example suggesting the language it does expect. But even then, most times people are talking in noisy environments or talk to fast or don't have an understand of how the machine might work. Regardless, who cares. They ruin the audio environment of a home. They're good for setting timers while you're cooking, that's about it.

tsss · on Nov 23, 2022

Car voice assistants do this, but they're still clunky and it takes them forever to list their options. Voice interfaces just like CLI suffer from extremely bad discoverability and presentation compared to GUIs and thus will always be limited to specialty applications. CLIs at least have a league of try-hards and hobby linux users to keep them alive.

Al-Khwarizmi · on Nov 23, 2022

They're also fantastic at playing soothing music while your hands are busy holding a crying baby.

Thlom · on Nov 23, 2022

Only thing I use Siri for as well.

version_five · on Nov 23, 2022

Right - natural language works for people because we have minds that are communicating. A virtual assistant has a list of things it can do, and uses language as an interface to them. So the language just becomes obfuscation instead of allowing clarification.

I've said before, I would prefer a voice assistant that optimized for traversing its menu system, in response to unambiguous noises (could be high and low pitch hums or whatever) that lets me bypass the guessing game and use the menu it's hiding

klibertp · on Nov 23, 2022

Like this: https://www.youtube.com/watch?v=8SkdfdXWYaI ? Here you traverse the AST, but the idea is similar, I think.

foobarian · on Nov 23, 2022

The problem is that it doesn't make money.

Otherwise, it works great :-) We love the hands-off usage mode because we cook a lot, so adding things to shopping lists or looking stuff up doesn't require cleaning hands in the middle of prep. Also the speakers are pretty darn good for the size and work well for music.

Doing complicated things is right out though. But the simple stuff works fine.

Ajedi32 · on Nov 23, 2022

I'm just waiting for someone to finally release a voice assistant built around an actual language model, like GPT-3 or LaMDA.

It would be more error prone in a lot of ways, which is probably why nobody's done it yet, but it would also be a _lot_ more powerful, and fulfill the vision of conversational AI in a way the current rules-based assistants do not.

I think if powerful language models were easily accessible to normal people (in an inexpensive and completely unrestricted fashion, like with Stable Diffusion) we'd already see this happening in the open source world. Companies are going to be a lot more hesitant to try it though until they have a way to 100% prevent the models from making mistakes that could reflect poorly on the company, which is going to take _way_ longer to achieve.

RupertEisenhart · on Nov 23, 2022

Are you trying to say, Alexa should be funding the synthetic language nerds over at Lojban[0] or the Universal Networking Language[1]???

That would be a fun universe.

[0] https://mw.lojban.org/index.php?title=Lojban&setlang=en-US

[1] https://en.wikipedia.org/wiki/Universal_Networking_Language

gernb · on Nov 23, 2022

Natural language conveys information to other people just fine. So the problem isn't that "Natural language is a fundamentally wrong vehicle to convey information to a computer". The problem is getting the computer to understand natural language to the same level as a human.

darkerside · on Nov 23, 2022

The problem is both

ClumsyPilot · on Nov 23, 2022

> we shouldn't regard formal language as a burden, but rather as a privilege

What the hell? Is riding public transport or riding a bike either a burdain or a privilidge? Is Driving a car?

I am trying to control shit in my home, it should be neither.

duggan · on Nov 23, 2022

Dijkstra's full essay[1] is a bit more illuminating, but essentially it's about how, for example, developing a system of symbols and formal language around mathematics has allowed "school children [to] learn to do what in earlier days only genius could achieve".

1: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD06xx/E...

em500 · on Nov 23, 2022

I think his argument even generalizes to literacy in general. Remember that reading and writing skills don't develop naturally (as opposed to spoken language). They require a large educational investment, and used to be reserved for the wealthy and the privileged.

bambax · on Nov 23, 2022

> I think there's potential here.

But how? Even if those interfaces were actually working, it's still extremely inconvenient to talk when you can click. You have to be somewhere where talking out loud doesn't disturb the people around you. That excludes most situations: open space offices, restaurants, coffee shops, public transport, cars with passengers, and most places in the home except maybe the bathroom.

And even if you're all alone in a silent place, giving instructions out loud takes more time than configuring a screen, and will always be error prone, because the feedback will always be ambiguous and imprecise.

Except maybe if the feedback is on a screen, but then if there's already a screen, why not use it.

t-sauer · on Nov 23, 2022

I think the best use cases for voice assistants are when you don’t have free hands. I have two scenarios where I use voice assistants: setting a timer while cooking and changing the music while showering. Both could be done by other means as well but they wouldn‘t be more convenient.

palebluedot · on Nov 23, 2022

Exactly. For instance, in the mornings Google Assistant has been really useful for when I say "OK Google, Good Morning". It then runs through and tells me:

* Current time, and weather forecast for the day

* Upcoming meetings today

* My current commute time to work, including traffic

* NPR news podcast

So during my routine of letting the dogs out, starting the coffee, etc. in the morning, I get the daily "essential" info.

teambob · on Nov 23, 2022

Also when driving but Siri / Google assistant are more applicable for that use case

jfoster · on Nov 23, 2022

Asking the time whilst getting ready.

rightbyte · on Nov 23, 2022

Seems like a perfect fit for a clock?

bambax · on Nov 23, 2022

Or a watch?

HPsquared · on Nov 23, 2022

Apple watch does have Siri, I suppose. They could be really bold and remove the screen.

Razengan · on Nov 23, 2022

> They could be really bold and remove the screen.

Then it would be called AirPods.

rightbyte · on Nov 23, 2022

Both or either would suffice.

Shank · on Nov 23, 2022

> But how? Even if those interfaces were actually working, it's still extremely inconvenient to talk when you can click. You have to be somewhere where talking out loud doesn't disturb the people around you. That excludes most situations: open space offices, restaurants, coffee shops, public transport, cars with passengers, and most places in the home except maybe the bathroom.

I would separate out the two, actually. There's a "natural language control system for the entire OS" and then there's the actual voice part. Voice is often mostly useful for accessibility purposes -- hands full, running, driving, etc. However, the other side is that a text-based NL assistant would also be profoundly useful. On iOS, you can enable "Type-to-siri" and you can just type sentences and Siri will respond back in text.

If we make progress on NL-driven command-lines, we can actually make progress on voice-assistants, and vice versa. The catch is that the voice side still needs recognition work.

papito · on Nov 23, 2022

Well, you are not trying to operate heavy machinery with Amazon Echo - hopefully. Voice as a common interface - I agree with all of that, but to me the everyday utility of being able to add something to my shopping list or my TODO list without having to fire up an APP greatly increases my quality of life. That part is magical, but I don't expect a lot more from it.

ghaff · on Nov 23, 2022

I used to use Alexa for my shopping list. I guess over time I came to the conclusion that adding something to a steno pad or my whiteboard was even easier.

stephc_int13 · on Nov 23, 2022

If the assistant AI was advanced enough for pleasant conversations to occur, it would be useful.

The would be trivial to use the interface on screen when appropriate, and a truly smart assistant should be able to follow the context and be aware of your preferences and mood.

This is not fundamentally impossible, we're simply not there yet.

ClumsyPilot · on Nov 23, 2022

> But how? Even if those interfaces were actually working, it's still extremely inconvenient to talk when you can click.

Smart home light/etc while hands are occupied like with a baby. But usecases are quite limited

pmontra · on Nov 23, 2022

> But how? Even if those interfaces were actually working, it's still extremely inconvenient to talk when you can click

Working from home changes that. I can see many more opportunities for a multimodal input interface. Examples:

1. My fingertips now are closer to the "reply" button below this text area than they are even to the touchpad. Touching "reply" is half a second, moving one hand to the touchpad, aiming the pointer at the button and clicking takes longer. With a mouse: much longer. Anyway, my screen is not a touchscreen. I'll click.

2. Or, with an assistant, I could have said "Click reply", provided that the assistant knows where the focus is and that it can read the form I'm typing in.

jorams · on Nov 23, 2022

Your fingertips while typing are even closer to the Tab and Enter keys on your keyboard, which, if pressed in sequence, have the exact same effect. Much simpler and much faster than either of your options.

pmontra · on Nov 23, 2022

Faster, don't know. Simpler, I didn't even think about it. However I'm doing it now. Thanks.

bruh2 · on Nov 24, 2022

Wow, the second point is really interesting. Binding a voice command to a key, in this case Enter.

matthewmacleod · on Nov 23, 2022

"hey siri, when I get home every day, turn on all of the lights, change my focus to personal, and turn on the news"

I think the problem with that is that even I, as a human, struggle to know for sure what you want.

You want to turn all the lights on in the house? Does that include the lamps in the bedroom? How about new lights that you add later? Or the ones in the garden? It's full of ambiguity. What device do you want to watch the news on? Or did you mean the radio? Do you want this to apply when you get back at 2am one night, meaning your family gets woken up when you turn on all the lights and start playing the news in their bedrooms?

I think that's probably why voice interfaces aren't likely to work well for anything beyond direct, specific, well-scoped requests: turn on the lights in the bedroom; turn off the heating at home; roll up the blinds; what's the weather like today; what's the remaining range on my car. They really struggle to deal with anything more complex – not so bad in theory, but really incredibly irritating when they make the wrong decision.

If you had some kind of 24-hour live-in assistant (a butler, maybe?), then they probably have the knowledge and intuition to make sensible decisions in response to fairly unstructured requests. But I think we're miles off getting a voice assistant to do it – not because they can't, necessarily, but because if they mess it up at all it's infuriating.

bombcar · on Nov 23, 2022

You can do some of this with shortcuts, and then use Siri to trigger the shortcut. But that involves thinking; the magic of Jeeves is that he knows what you want even before you do.

bluGill · on Nov 23, 2022

The problem is there are more different combinations I might want as a shortcut then I have time to program/remember. I can remember something like a dozen commonly used shortcuts. However when 5 years from now I arrive home at 2am (for the first time in several decades, but it will probably happen at some point again in my life) will I remember the correct shortcut - and assuming I do, is it up to date with whatever changes have been made to my house?

What about the shortcut for when I need to leave at 3am for some reason. then a different shortcut for when it isn't just me, but my whole family leaving at 3am. An still another for my son having to leave that early.

Jeeves can figure it out when I arrive at 2am so I don't need to program it.

matthewmacleod · on Nov 23, 2022

You've reminded me of some aspects of these platforms that I like in a more general sense – like for example the way the Apple Watch will automatically ring the alarm on my phone if I forget to put my watch on, or if I get up before my alarm goes off the watch will notice and ask if I want to skip the alarm for the day. This stuff genuinely feels almost like magic sometimes – the risk is that when anything like this goes wrong it's awful.

bombcar · on Nov 23, 2022

Yeah these are graceful - and the watch will start out very light buzzing and then get louder.

gspencley · on Nov 23, 2022

I might be in the minority, but I also don't want to add things to my life that make my environment noisier or that require me or others living with me to speak more. As much of a Star Trek fan as I am, I never found "The Computer" to be appealing, and always thought of it more as an artistic device. It's a lot easier to communicate a character's intent / action if they are vocalizing it for performance. Even in scenes where they are "typing" something into the computer, they will inevitably be communicating to the captain or another character what they are doing.

In practical reality these interfaces feel, to me, as extremely inefficient. As someone who doesn't particularly like to speak, and prefers silent environments, these interfaces require more energy from me to use. Unless they are serving someone who has a physical impairment then I don't see what problems exist that these solve, but I can identify lots of problems that they introduce (not only noise but privacy / security vulnerabilities etc.)

Personal preference.

eternityforest · on Nov 23, 2022

Timers and reminders alone are enough to make them a pretty nice thing to have though.

I don't really want them to be all that much more powerful, because natural language can be imprecise, and... there's just not much I that I want to automate in a home setting beyond some real simple timers for lights and stuff.

What if I had a bad day and didn't want to see depressing news? Or what if I came home and was talking on the phone when it turned the news on?

True automation as opposed to just telemetry and remote control can easily be annoying more than helpful.

I like the idea of automation... but I don't actually... automate anything aside from timers and reminders.

ghaff · on Nov 23, 2022

I think that's generally true though playing music is a little more freeform. (And, guess what? Voice assistants tend to be worse at that.)

The problem is that you have many many billions of dollars have been sunk into making these devices about more than setting alarms and timers. There's actually been a lot of pretty amazing progress. But it's yet another one of those things that getting to 90% to anyone but techies who want to fiddle with their smarthome stuff or otherwise play with the technology.

eternityforest · on Nov 23, 2022

They might have a sudden increase in usefulness when smarthome stuff is more common, although smart bulbs are a bit of a hassle in most switched outlets, because the switch is usually more convenient.

Maybe they'll add an app that lets you browse possible commands so it's more discoverable.

ghaff · on Nov 23, 2022

It's probably true that a well-integrated smarthome would benefit from voice control.

But I'd observe that I'm going up to my brother's tomorrow and he has all manner of timers and other WiFi-connected stuff and none of it has any sort of centralized control and that's pretty normal even for people who have a lot of that sort of thing.

And, yeah, the only smart light thing I have at home is one thing that doesn't have a controlling light switch and I used X10 for it for years before I got an Alexa.

antupis · on Nov 23, 2022

If I would be in this space I would just build voice assistant to very specific situations where you cannot type like driving, cooking, doing some sport etc. There is lots of potential but big players are kinda trying build generic tool for every situation which is super hard problem.

dmitriid · on Nov 23, 2022

You want utility. The big players want a product that can be monetized and milked for revenue.

Mistletoe · on Nov 23, 2022

My Alexa asked me today if I wanted an Avatar theme. No I really do not, Alexa. I was reminded of the article a few days ago how they can’t monetize this well and are somehow losing $10 billion. :)

brycehalley · on Nov 23, 2022

Voice assistants have reached the Unhelpful Valley stage.

When they were a novelty I recall the excitement of trying new commands and layering in context, after many failures I've been conditioned to now only attempt and expect success with generic queries.

mc32 · on Nov 23, 2022

To me what’s interesting is that MS smelled that it was a problem a while ago and pulled the plug before it ate a hole in their wallet but Amazon and Google keep plugging along ploughing money into a bottomless pit. Apple has a different play and looks like they are controlling their losses there quite well and may act as a slight loss leader for other products.

foobarian · on Nov 23, 2022

I can't fathom how they managed to spend so much on it, though. The product has been around for quite a while, as well, so it's not some initial ramp-up cost. $3B/quarter $10B/year? Wow.

Edit: Maybe things like this happen because there are various nerds who lead these products and are good at talking the businesspeople into funding it. Maybe this was only possible at the big tech growth stage while business wasn't that good at telling the value proposition. So end result, lots more engineers get paid which is great in my book :-)

serial_dev · on Nov 23, 2022

> Instead, it's a guessing game about syntax and semantics, and frequently a source of frustration

My biggest frustration with Alexa is getting it play the podcasts I want to listen to. Even popular podcasts with English names are hard to get just right for Alexa. The same goes for song titles and bands that are not popular, or they are in other languages.

Usually when I want to take a shower, I try to get the podcasts/music to play for 2 minutes, then sigh, give up and just say "Alexa play Britney Spears".

ghaff · on Nov 23, 2022

And discoverability. For a long drive I probably want to pick out some specific podcast episodes rather than play whatever. I'm just not a whatever background sound sort of person. The interfaces aren't really good enough to present me with some options with voice control only. So I end up mostly pre-populating a "Car" playlist.

phkahler · on Nov 23, 2022

>> A voice system that can do literally everything one can do with a keyboard and a mouse would be magical, but no system offers that.

And even then, a voice assistant is essentially a user interface, not a product or service.

It could be a service if you could reliably say "Alexa, plan my trip to customer X the week of the 30th and send me my itinerary". But for now they are an alternative to a phone UI.

ghaff · on Nov 23, 2022

The reality is that even a human personal assistant can rapidly devolve to being more of a hindrance than a help if they're not very good once you get beyond simple mechanical tasks. Even with all the knowledge about the world that most adults carry around in their heads. Yes, a poor human assistant can fall down in other ways such as forgetting to do something--but they have a lot of context.

This seems a really high bar for voice assistants aspiring to do much more than set alarms or turn the odd light etc. on or off.

bluGill · on Nov 23, 2022

These days few people have personal secretaries, but back when they were common they really were personal - once you got a personal secretary she (nearly always she, I feel like we should acknowledge sexism even though it is irrelevant my point) would follow you (nearly always male), as you moved job to job and up the ladder. She went with the you because once you spent a few years training her to how you worked, a new secretary would greatly limit your effectiveness.

These days a large part of what people relied on secretaries for a computer can do faster, so only at the highest levels do you see them. There are still secretaries at the low levels, but not nearly as many, and they are not doing the same tasks.

ghaff · on Nov 23, 2022

That's pretty much it. We call them executive admins these days where they exist.

And, yeah, assistants shared with a bunch of other people--as with travel agents in general--aren't really all that useful. If I'm mostly just giving fairly mechanical instructions to execute, it's probably easier for me to go online and figure out the options myself.

A secretary made a lot more sense when you dictated memos for inter-office mail and retrieving information often involved making multiple phone calls.

phkahler · on Nov 23, 2022

>> This seems a really high bar for voice assistants aspiring to do much more than set alarms or turn the odd light etc. on or off.

That's kind of my point. A voice assistant is just a fancy UI until they reach the level of AGI, and I don't see the point in spending billions of dollars on them to be a simple UI as Amazon seems to be doing.

enobrev · on Nov 23, 2022

If that voice assistant were self hosted in the little device, I agree. But those simple interfaces are connected directly to a significantly larger machine that literally knows everything about you and half of everyone you know. It's not unheard of to expect it to be more useful than setting timers and playing music.

ghaff · on Nov 23, 2022

They "know" a bunch of discrete facts. They don't know that if you book me on a red-eye unnecessarily to save $100 I'll be hunting you down. Or any of a zillion other flexible preferences--some of which I'm not even very consistent about.

enobrev · on Nov 23, 2022

I don't know about you personally, but google definitely knows I've never booked a red-eye and that I haven't booked a layover since the early aughts. I'm fairly sure Google could easily figure out not only where I'd be interested in flying to in the next few months, but when and for how long, and at what price points I'd consider upgrading my flight.

I know they know this about me not only because of my Gmail account but also because I use Google flights to find the flights before I book them.

Unfortunately they're not using this data to help me. Rather they're using it to target advertising to me. But they definitely have the data and the machinery to be more useful to me with more than just a few facts

ghaff · on Nov 23, 2022

Maybe my travel is more complicated but I even not infrequently get annoyed with "past me" for various travel-related decisions. I avoid red-eyes but at some price point I won't--or maybe only if it's someone else's money. And maybe I don't have a choice based on my schedule or just what flights are available. Normally I won't do an unnecessary layover but maybe I will to fly my preferred airline.

It gets complicated in a hurry and for the cases where it is relatively simple (and when it gets into very complex international travel a voice interface is going to be completely useless), I can look up my options pretty quickly on a computer.

PurpleRamen · on Nov 23, 2022

The potential would be there, if they would focus on the assistant-part, and take the voice just as a mean to interact with the assistant, besides other means like clicking, typing, showing complex information on a screen, etc.

Voice alone sucks, it's just too limited to be useful on a grand scale. Similarly, command lines suck too. The shell in general has the same problems that Voice assistants have, just that they have more value and had decades to mature into something actually useful. And toady we have unix-shells which reduce the problematic parts by many levels, and still receive constant improvements. This is missing for voice assistants, because unix-shells are growing and improving in an open space, where everyone can add their own things. This is not happening in big tech.

sublinear · on Nov 23, 2022

I don't think this is actually reliably possible due to the fact that while grammar does tend to follow patterns sometimes, we're fundamentally dealing with an exponential amount of ways to say things to a voice assistant.

In the spirit of the title of this post, someone else also has to say something.

If your argument is that this is a "non-visual command line" there's slim hope of the layperson learning a whole secret grammar without even a goddamn man page just to do their menial tasks.

ianai · on Nov 23, 2022

I really doubt *nix would have made it so far if the cli were audio based, too. It's a fundamentally slower and lower bandwidth communication channel.

zozbot234 · on Nov 23, 2022

*nix was optimized for low-bandwith channels. That's why the command names and options are extremely terse and typically return trivial output on success. OTOH it was assumed that input would be reliable, so there's no confirmation required for potentially dangerous commands. A "*nix for voice" would need to address that, at the very least.

ianai · on Nov 23, 2022

I’d sure be lost if I had to listen to the entirety of a manpage or dmesg output or /var/log/messages read out by voice. Some of those could take hours to read out. Nothing actually trivial about *nix command output. Just sometimes terse.

_dain_ · on Nov 23, 2022

>Voice assistants are basically just mainstream non-visual command-lines, and it's unsurprising to me that something that relies heavily on memorization and extremely specialized "skills" isn't quite taking off in the way it was imagined.

This got me thinking. Voice recognition is basically a commodity now .. there are open source AI engines that can do it offline really well. So the recognition part is solved, you can just grab it from your distro's package manager. Now there's just the language part.

Thing is, I don't want to speak to my computer using English. Aside from the enormous practical problems in natural language processing you've outlined, I just find the idea creepy[1].

What I want is to unambiguously tell it to do arbitrary things. I.e. use it as an actual computer, not a toy that can do a few tricks. I.e. actually program it. In some kind of Turing complete shell language that is optimized for being spoken aloud. You would speak words into the open source voice recognizer, it writes those to stdout, then an interpreter reads from stdin and executes the instructions.

Is there any language like this? What should it look like?

And yeah that would take effort to learn to use it right, just like any other programming language; so be it. This would be a hobbyist thing.

[1] https://i.kym-cdn.com/photos/images/original/002/054/961/748...

viraptor · on Nov 23, 2022

> So the recognition part is solved

If you're using an averaged American voice - maybe. But it's really not solved for everyone. Google assistant can't set the right timer for me 1/10 times. And that's before we get to heavy accent Scots and others.

arethuza · on Nov 23, 2022

Even my "affected Edinburgh accent", as someone once described it, causes no end of trouble with voice recognition.

bambax · on Nov 23, 2022

Obligatory reference: https://www.youtube.com/watch?v=NMS2VnDveP8

simiones · on Nov 23, 2022

> Voice recognition is basically a commodity now .. there are open source AI engines that can do it offline really well. So the recognition part is solved, you can just grab it from your distro's package manager.

This is potentially far from true, depending on how exactly you draw the line between "voice recognition" and "language". I've looked at quite a few transcription services, and they fail a lot of the time for most people - those who either have a non-native accent (even if very slight!) or those who do any amount of stammering or other vocal tics.

ghaff · on Nov 23, 2022

I find the ML transcription services, given 2 people speaking English with high quality sound and without heavy accents/a lot of jargon, to be adequate for having a skimmable record--such as for extracting quotations (and just go back to the recording to confirm the exact words if it's not obvious). But if I'm publishing a transcript I get a human transcription. Cleaning up the ML stuff takes way too much time and I wouldn't publish a transcript without cleaning it up.

simiones · on Nov 23, 2022

I was in fact looking at some transcriptions of my recent meetings, and found one that captures how even small mistakes can make for completely not-understandable transcripts, unless they are manually cleaned up.

Manual transcription:

> So no: long story short, Slum is basically the way we can have an individual [, uhhh,] instance that carries all the licenses.

(Slum is a project name in this case)

Computer transcription (MS Teams):

> So no.

> A long story shorts. Love is basically the way we can have an individual.

> OHS instance that carries all the license.

Shank · on Nov 23, 2022

> Voice recognition is basically a commodity now .. there are open source AI engines that can do it offline really well. So the recognition part is solved, you can just grab it from your distro's package manager.

I personally don't consider this a fully-solved problem. The best transcription system I've used is OpenAI Whisper, and it doesn't work in realtime. Maybe it's fine on small amounts but it's still not perfect. You really need error to be driven down dramatically. Zoom auto-captions are a joke in terms of how badly they work for me, and Live Text (beta) on macOS is equally dreadful. YouTube auto-captions suck. All of these use industry-leading APIs. If I'm speaking a voice command and one single word is wrong, usually the whole thing fails.

There's an entirely separate issue about things that are Proper Nouns that don't exist. For example, "Todoist" is often misunderstood by Siri. Thus, people started saying "Two doist (where doist rhymes with joist)" to fool it into understanding "Todoist". Media like anime with strange titles from other languages often flat out trolls these transcription systems. ("Hey Siri, remind me to watch Kimetsu no Yaiba tomorrow".)

Aramgutang · on Nov 23, 2022

That reminds me of the handwriting recognition approach [1] used in old Palm Pilot devices. Even though the shapes it expected you to draw resembled the corresponding letters, you would never draw them like that if you were writing on paper.

You knew that you were drawing something designed for a computer to recognise as unambiguously as possible, while being efficient to draw quickly and easy to learn for you. I feel like that's the kind of notion that voice interfaces should somehow expand upon.

[1] https://en.wikipedia.org/wiki/Graffiti_(Palm_OS)

draugadrotten · on Nov 23, 2022

> And yeah that would take effort to learn to use it right, just like any other programming language; so be it. This would be a hobbyist thing.

There are quite a few hobbyists working on local on-prem privacy focused voice assistans with conversation support.

https://www.home-assistant.io/integrations/#voice https://www.home-assistant.io/integrations/conversation/

Have fun. It is a rabbit hole.

spookthesunset · on Nov 23, 2022

To me the hardest problem is simply remembering what every light on my network is named. Did I call the light next to my desk “desk light” or did I call it “office light”? If I don’t get the name exactly right, I cannot control the light. Multiply that by every other light in the house and it becomes a lot to remember. I have probably 15 lights controlled by Alexa and I can only remember the name of like three of them. Thus most of the time it is just “Alexa turn on the lights” so it can turn everything on in a room.

If these voice assistants were smarter about “alternative” names for every device it might be easier to use. But as it stands, it’s kind of a pain because the way you phrase each request is so unforgiving…

Oh yeah, and god help you if your device name is similar to your room name. If your room is “office” (or did I name it “the office”?) and your light is “office light” Alexa is gonna have a bad time figuring the two apart.

I have no clue how to fix this…

PS: this is why I question steering wheel free self driving cars. How will we tell these things exactly where to go when we cannot even reliably tell our voice assistants exactly what light to turn on?

7952 · on Nov 23, 2022

I think the biggest potential is with Microsoft Teams in business. It is ubiquitousness in people's work life, has access to data and has integrated with everything. And adding cortana to calls would be an easy step for people to understand and learn. People would say "cortana share my screen". People would learn phrases from each other.

happymellon · on Nov 23, 2022

But teams hasn't figured out how to send text in a coherent way.

It's used because companies can cheap out on buying a license for other communication applications, it is fundamentally worse than anything else in any other metric. If voice lets me respond to a message without hunting for the hidden reply because Teams shoves it below the bottom of the screen then it could be a win. Considering UX is so low for Teams I doubt it will.

SheinhardtWigCo · on Nov 23, 2022

> There /is/ power to-be-had, but nobody has really tapped it.

This kind of thing can't be built for modern mainstream operating systems because they generally prevent subjugation of the OS components and other programs, even if the user wants that, ostensibly for security reasons.

Unlike a human operator, an assistant "app" can only operate within the bounds of APIs defined by the OS vendors and third-party developers. Gone are the days of third-party software that extends the operating system in ways that the overlords couldn't (or wouldn't) dream of.

sdf4j · on Nov 23, 2022

That's not entirely true. Accessibility APIs on macOS, for example, would let you control so many aspects of the OS from user land apps given that permissions are granted. But voice assistants are not up to the task.

bistable · on Nov 23, 2022

I think you're identifying some of the right problems here. All voice assistants are based on turn-taking, and when the VoiceAI hits one of those failure points and just comes back with "I didn't get that" it leaves the user in a frustrating state trying to debug what's wrong.

I work at SoundHound where we've been worried about these issues. (I'm going to plug our recent work...) Our new approach is to do natural language understanding in real-time instead of at the utterance (turn) taking level. That way we can give the user constant feedback in real-time. In the case of a screen that means the user sees right away that they are understood, and if not, a better hint of what went wrong. For example a likely mistake is an ASR mistranscription for a word or two.

We still need to prove this is a better paradigm for VoiceAI in products that people can try for themselves, and are working towards that goal. I hope that voice interfaces that were clunky with turn-taking will finally be more naturally usable with real-time NLU.

https://www.youtube.com/watch?v=5WLYH1qHfq8

sliken · on Nov 23, 2022

I tried Amazon's Alexa, the top end model with a display. Often it would taunt you about new/interesting things on the screen, but I could never get them to work. I'd had to memorize things to get even the basics working. Ended up unplugging it.

However Google's Assistant in comparison worked great, no memorization, and very useful. Sure time, weather, set timers, and alarms worked great with a very flexible set of natural language queries. Even more complex things like what will be the temperature tomorrow at 10pm, simple calculations and unit conversions. But also things like IMDB like queries about directors, actors, which movies someone was in, etc generally worked well. It seemed to really understand things, not just "A web search returned ...". Even more complex things like the wheelbase of a 2004 WRX would return an answer, not a search result.

With all that said I'm looking for a non-cloud/on site solution, even if it requires more work, most recently noticed https://github.com/rhasspy/rhasspy

Sakos · on Nov 23, 2022

The big issue is that there's no clearly defined interface for users. What commands are possible? Nobody knows. So people default to the most obvious things like setting a timer. Is it possible to setup your own commands and build your own work flows? AFAIK, no. So the tech is essentially dead in the water until companies fundamentally rethink what they're trying to do with voice assistants.

jasmer · on Nov 23, 2022

Yup. At the risk of being glib I would say this is 90% of the issue. Or more like 'the big blocking issue' at the moment.

Voice can do way more than we know, but we have no idea what it does or how to use it.

Standardizing the interface and providing tutorials would possibly change things dramatically.

And this goes for the back-end protocols as well.

The tech is way, way ahead of the UI and integration.

Imagine getting the power of 'git' with no tutorial and not really an understanding of what it does? Good luck with that.

90% of us would be using it in the car to do a lot of things if we really knew how to do it:

You: "Siri: Command. Open. Mail. Prompt. Recipients starting with S"

Siri: "Sarah, Sue, Sundar"

You: "Stop. Command. Message. To: Sunar. Thanks for the note. Stop. Send without Review"

Some of this already exists, but it's product specific etc. there needs to be some kind of natural universal interface - or we have to wait until the AI is really, really that good.

4b11b4 · on Nov 23, 2022

Talon voice can do everything a keyboard and mouse offers, plus more (contextual awareness, higher level abstraction). Very powerful in combination with modal editing. I'm not affiliated, just a user.

Granted, this is for a specific user base and yes, not in coffee shops.

Razengan · on Nov 23, 2022

This timeline is such a mishmash of mediocrity. Voice assistants could have been a vibrant ecosystem of different personalities, like say buying a Darth Vader voice pack or having your computer sound like a snooty English butler..

There's a great little game series called Megaman Battle Network (Rockman.exe in Japan) which diverges from the mainline by showing an alternate universe where scientists focused on AI instead of robotics, resulting in a world where "Navis" are ubiquitous.

I wonder, what if our early software engineers focused on bringing natural voice control to CLIs, before perfecting GUIs first?

amelius · on Nov 23, 2022

> There /is/ power to-be-had

This is not power. This is just first-world problems.

bogdanstanciu · on Nov 23, 2022

I think these assistants just need to give the user a way to edit interpretations.

A 'debug' area that lets you ask a command, see what was interpreted - and immediately edit or click "that's not what I wanted". But not an afterthought and not a cumbersome process like setting up an automation that is triggered by specific commands.

Imagine telling your voice assistant "You're wrong, as usual" and instead of it giving you the boiler plate "I'm sorry ", it actually offered a way to improve itself.

iquerno · on Nov 23, 2022

I would think that a good command-line is one that responds to me within milliseconds on a crapbox i386 machine, and I can COMMAND it what to do. A good command-line is not a binary blob that cannot parse simple instructions correctly.

At the same time, siri seems to be getting slower and fatter every iteration so perhaps it is becoming more human ;)

sokoloff · on Nov 23, 2022

> "hey siri, create alarms every 5 minutes starting at 6am tomorrow"

“OK, I’ve created an infinite number of alarms, every five minutes, starting at 6 AM tomorrow!”

(As a native English speaker, I'm not sure what specific outcome you want to happen from that request. That's the one that makes the most sense.)

ghaff · on Nov 23, 2022

As a native English speaker, that seems a profoundly odd request but that is what you asked for.

And you now have me wondering how open-ended calendar requests are actually implemented given that they can't literally have entries out to infinity. (I assume they go out some finite period and some background process periodically re-populates future entries.)

mercutio2 · on Nov 23, 2022

A recurrence rule is added to a start event, then an occurrence cache is either generated on the fly for periods of interest, or, yes, a rolling cache a year or two in the future is maintained and updated daily.

ghaff · on Nov 23, 2022

Perhaps trivial, but actually seems like an interesting question given you have to potentially tradeoff RPCs for routine queries (and the number of database records) vs. being wrong for the random "Am I free on this day three years from now?" query. Of course, the answer may be that, in general, the differences don't really matter.

1MachineElf · on Nov 23, 2022

Another pitfall of most voice assistants is that they are really designed first with the corporation in mind rather than the user. Most are proxies for surveillance, advertising, or are just steering consumers back to a preferred set of walled-garden services.

PhasmaFelis · on Nov 24, 2022

Yeah, the whole idea has a lot of potential that seems like it should be within reach, but somehow it's 2022 and my phone still can't handle "hey Google, play my driving playlist on Spotify."

freeone3000 · on Nov 23, 2022

Your queries continue to be money-sinks -- even in your ideal case, you aren't buying anything! This query costs them money but earns them nothing. This is useless.

gernb · on Nov 23, 2022

> an assistant that's integrated into the OS and can change any setting

That sounds like a security nightmare. Someone walks by and starts changing your system settings? No thank you

Eleison23 · on Nov 23, 2022

Me and voice assistants are like me on the ballroom dance floor. I loved to take the lessons and learn all sorts of moves and chain them all together and look impressive, but when I got onto the floor with a partner, I just wouldn't know what to do or where to start. I kept to the "basic" steps and maybe a timid little turn once in a while.

Maybe it's possible to learn a working vocabulary and know how to command a voice assistant. I know my way around several command lines, but I have no idea what to say to Hey Google.

Avicebron · on Nov 23, 2022

it almost sounds like you are describing how it feels to learn a new language. And if that's the case and people need to learn "voice assistant" to communicate with their device effectively, hasn't it utterly failed as a natural language processor?

Also I know this is true in other domains as well, obviously there is a common "google-ese" that people learn to narrow down their searches.