Which is exactly why I prefer formal query languages over NLP queries. In both c...

cmrdporcupine · on Oct 26, 2020

Agreed; back in the day when we/I used to play text adventures, or interact with MUDs/MOOs those systems had English-like interaction languages but the semantics of them were relatively clear -- you mostly had to follow the verb/prep/object formula and once you figured that out, you could manage the system fairly well, without running into a lot of terrible corner cases.

I'd rather have an assistant type system with a fairly well defined query system that exposed its capabilities and limitations directly, rather than me having to guess at the corner cases and failure points.

Disclaimer: I work @ Google on display assistant devices, but I don't work on the actual assistant interaction pieces.

Izkata · on Oct 26, 2020

> Which is exactly why I prefer formal query languages over NLP queries.

People like to reference Star Trek for stuff like NLP queries, but if you go back to TNG and pay close attention to the verbal queries to the computer, much of the time it isn't natural language. They seem to actually use some sort of formal query language that fits English a bit closer, but is still distinct from when the characters speak to each other.

fennecfoxen · on Oct 26, 2020

> They seem to actually use some sort of formal query language that fits English a bit closer, but is still distinct from when the characters speak to each other.

"Computer, begin auto-destruct sequence, authorization Picard 4-7 Alpha Tango."

- Wake word. Command. Authorization stanza. (I bet the computer would prompt for authorization if missing.)

"Computer, Commander Beverly Crusher. Confirm auto-destruct sequence, authorization Crusher 2-2 Beta Charlie."

- Wake word (possibly superfluous). Identification stanza (probably superfluous for the usual crew, but I can see from an HCI perspective that you might want to make people provide it specifically for such a consequential protocol, and it may also be of merit if some random admiral usually halfway across the galaxy pops in to confirm). Command confirmation, authorization stanza.

"This is Captain Jean-Luc Picard. Destruct sequence Alpha-One. Fifteen minutes, silent countdown. Enable."

- Computer is very awake at this point, no wake word. Identification stanza. Sequence parameter. Time parameter. Verbosity parameter. Commit.

YeGoblynQueenne · on Oct 26, 2020

To be fair, that's just how Picard speaks (e.g. "engage"). I haven't noticed anyone else saying "Tea. Earl Grey. Hot".

In any case I think this kind of speech is formulaic for the benefit of the audience, most of all, who are made aware through the formality that the speaker is addressing a machine. Additionally, we're watching navy men and women in space, so we expect them to speak to each other and to their computers in a formulaic manner ("Deck 5! Report!" etc, I can't think of good examples, brain's too tired).

Or perhaps the idea is that Trek AI is not really as advanced as to be able to understand natural language and that makes Data such a unique specimen.

Then again, there's the example of the Doctor in Voyager. I'm confused, I admit.

dragonwriter · on Oct 26, 2020

The Doctor in Voyager is, IIRC, an early prototype, and Voyager is bother later than The Next Generation and set on a newer ship. And the Doctor, IIRC, benefited from upgrades during the show, having been more limited I scope initially.

In any case, Trek isn't super internally consistent, anyhow.

0xffff2 · on Oct 26, 2020

Maybe I'm misremembering, but I distinctly remember that basically everyone ever shown interacting with a replicator follows the generic->specific parameter hierarchy.

YeGoblynQueenne · on Oct 27, 2020

You and dragonwriter are probably right. I'm going by memory, here :)

pjc50 · on Oct 26, 2020

It may be easier to train the humans to be precise. This is the solution adopted by armies which need unambiguous communication over noisy radio.

https://en.m.wikipedia.org/wiki/Operations_order

ketzo · on Oct 26, 2020

Well, to extend the GUI/console metaphor, it means that at some point soon, we'll all be using NLP because it's dramatically more user-friendly for the vast majority of people.

PeterisP · on Oct 26, 2020

GUIs won over console workflows because GUIs have better discoverability and the "recall vs recognize" difference; it's mentally much easier to recognize the option you want when presented it than to recall the existence or the naming of that option.

In those aspects of UX, voice interfaces have the same drawbacks as console apps when compared to a good GUI.

Also, they have to work within the "bandwidth bottleneck" of audio - just imagine a phone system that tells you all the options you have, "Press 1 for something, Press 2 for another thing..." - they are so annoying because they are slow and inherently linear; a GUI can show the same options all at once, and you can read it much faster than listen to them.

So NLP as such is not dramatically more user-friendly unless it is at the "do what I mean" level which likely requries full human-level general artificial intelligence; before that it's just a voice equivalent of a console app, sharing all the problems of discoverability and needing to remember what the system can do and how should it be invoked.

ryandrake · on Oct 26, 2020

> Also, they have to work within the "bandwidth bottleneck" of audio - just imagine a phone system that tells you all the options you have, "Press 1 for something, Press 2 for another thing..." - they are so annoying because they are slow and inherently linear

They're even slower now because the brain trust decided adding voice control to the phone menu system was a great idea. So before, it said "For prescription refills, press 1." Now I have to wait for "For prescription refills, press 1, or say prescription refills." How on earth does that improve anything? I can just as easily press 1 as I can say a word, and when I press 1, there is a near 100% chance that the computer on the other end will understand my command.

Some phone menu voice automation are even worse. "Tell me what you want! <silence>" Then you say something, and it says "I didn't recognize that. Please tell me what you want!" Then it fails again and says "I didn't recognize that. For prescription refills, press 1, or say prescription refills..." Oh great so there was a menu? Why did you waste my time earlier?

Voice is just a terrible, low-fidelity, low-bandwidth way of commanding a computer. You might as well have handwriting input while you're at it: You write what you want on a piece of paper, and hold it up to the camera and the computer tries to figure out what you wrote. Just as silly.

toast0 · on Oct 26, 2020

> I can just as easily press 1 as I can say a word, and when I press 1, there is a near 100% chance that the computer on the other end will understand my command.

So, I would rarely say something when I could push a button, but when using a smartphone on a call, it's not always easy or obvious how to push a button. Some people may have mobility issues making it hard to push a button, or be on speaker phone far away from the buttons. Or, maybe they haven't updated their telephone equipment in 50 years, and only have a rotary dial. Or, maybe on a terrible VoIP system that can't manage to get the tones through.

There's probably some way to clean up the script.

"(Please listen carefully, as our options have changed.) Please choose from the following options: Say prescription refills or press 1; say insurance denied or press 2; say referral to veterinary care or press 3"

I could get behind voice interfaces for more things if the commands words were documented, clear, and consistent, and the damn things worked. Until then, buttons seem good to me.

dheera · on Oct 26, 2020

Totally. I cringe at FedEx's system.

"Welcome to FedEx. [... blah blah blah ...] Tell me what I can help you with today."

"a package"

I mean, what the hell else can you help me with today anyway?

0xffff2 · on Oct 26, 2020

Honestly, "Tell me what you want!" is better than a system that forces you to listen to all of the options since "representative" is what I want 99% of the time when I have exhausted all other options and decided to do battle with an automated phone system.

Snitch-Thursday · on Oct 26, 2020

> You write what you want on a piece of paper, and hold it up to the camera and the computer tries to figure out what you wrote. Just as silly.

Ironically I used to love me some graffiti on PalmOS and Google Handwriting Input on my Droid 2, but I agree with the spirit of your comment.

heavyset_go · on Oct 26, 2020

I see you've run into CVS or Walgreens' automated phone system.

GirkovArpa · on Oct 26, 2020

This "recall vs recognize" point should be raised in every console vs GUI debate. It's pretty much the final word.

Macha · on Oct 27, 2020

You can get the recognise experience on the command line with the smarter autocomplete available on shells like fish or zsh.

    foo --<tab>

You're now present with a list of options and depending on your config, the man page one liner descriptions

dec0dedab0de · on Oct 26, 2020

I understand your point, but I'm not sure that GUIs won out because they were dramatically more user-friendly. It certainly helped, but I think they won because it made multitasking possible. Multitasking from the users perspective that is, the ability to interact with more than one application at the same time. That was just not possible on a console, so even people who didn't need user friendliness were able to do things they couldn't do before. I was young at the time, but that's how I remember it at least.

Godel_unicode · on Oct 26, 2020

That's just not true, though. As with many console things, multi-tasking is totally possible but it's discoverability is terrible. Ctrl+z and `jobs` is the entry level, with tmux being the end-state reached via gnu screen. This lack of discoverability is the same problem voice assistants have only moreso; no `apropos` and no tab completion.

GUIs are for discovery, CLIs are for power via composability, voice/NLP assistants are for convenience.

lqet · on Oct 26, 2020

> That was just not possible on a console,

Was it? Even if you discard stuff like tmux as already a GUI, you can still send whatever is running at the moment to the background with CTRL-Z and typing "bg" on any modern Unix system. "jobs" will then list all your processes, and "fg <ID>" will bring it to the foreground. I am sure this functionality predates most modern GUIs.

thisisnico · on Oct 26, 2020

Aside from the usability POV, GUI provided significantly more features to the user, such as visualization of information and data. Images, Audio, Video, Multi-Media, 3D/2D Video games. You could have more information on the screen and at your fingertips at the same time. You can load many of these things from the CLI, but it's not as convenient as within a GUI.

mixmastamyk · on Oct 26, 2020

> That was just not possible on a console

You may be thinking of DOS, which yes had almost no multitasking ability available.

However there were multiple timesharing operating systems that existed before the PC and GUIs, Unix being the most famous and still around.

Multitasking is quite possible on a Linux console for example. It has 5 or more consoles, each handling different users, each being able to be split via screen/tmux. Each shell can run jobs in the background as well.

heavyset_go · on Oct 26, 2020

From my observations, people have reduced their voice assistants to objects that sometimes tell them the weather or switch their lights on and off, and sometimes do something completely unrelated when activated.