Which is exactly why I prefer formal query languages over NLP queries. In both cases (at least with most state of the art NLP techniques), you have to learn certain patterns and ways to phrase a query so that the system will reliably understand it. With formal query languages, these patterns are well-defined, can be looked up and will most likely not change significantly (so there is value in memorizing them). With NLP systems, the patterns are completely opaque, you have to learn them through trial and error, they may change anytime (e.g. because the model is retrained) and they are usually significantly less powerful.
I sometimes feel that the trend to prefer NLP over formal query languages is comparable to the trend to prefer GUIs over consoles in the 80ies and 90ies.
Agreed; back in the day when we/I used to play text adventures, or interact with MUDs/MOOs those systems had English-like interaction languages but the semantics of them were relatively clear -- you mostly had to follow the verb/prep/object formula and once you figured that out, you could manage the system fairly well, without running into a lot of terrible corner cases.
I'd rather have an assistant type system with a fairly well defined query system that exposed its capabilities and limitations directly, rather than me having to guess at the corner cases and failure points.
Disclaimer: I work @ Google on display assistant devices, but I don't work on the actual assistant interaction pieces.
> Which is exactly why I prefer formal query languages over NLP queries.
People like to reference Star Trek for stuff like NLP queries, but if you go back to TNG and pay close attention to the verbal queries to the computer, much of the time it isn't natural language. They seem to actually use some sort of formal query language that fits English a bit closer, but is still distinct from when the characters speak to each other.
> They seem to actually use some sort of formal query language that fits English a bit closer, but is still distinct from when the characters speak to each other.
"Computer, begin auto-destruct sequence, authorization Picard 4-7 Alpha Tango."
- Wake word. Command. Authorization stanza. (I bet the computer would prompt for authorization if missing.)
- Wake word (possibly superfluous). Identification stanza (probably superfluous for the usual crew, but I can see from an HCI perspective that you might want to make people provide it specifically for such a consequential protocol, and it may also be of merit if some random admiral usually halfway across the galaxy pops in to confirm). Command confirmation, authorization stanza.
To be fair, that's just how Picard speaks (e.g. "engage"). I haven't noticed anyone else saying "Tea. Earl Grey. Hot".
In any case I think this kind of speech is formulaic for the benefit of the audience, most of all, who are made aware through the formality that the speaker is addressing a machine. Additionally, we're watching navy men and women in space, so we expect them to speak to each other and to their computers in a formulaic manner ("Deck 5! Report!" etc, I can't think of good examples, brain's too tired).
Or perhaps the idea is that Trek AI is not really as advanced as to be able to understand natural language and that makes Data such a unique specimen.
Then again, there's the example of the Doctor in Voyager. I'm confused, I admit.
The Doctor in Voyager is, IIRC, an early prototype, and Voyager is bother later than The Next Generation and set on a newer ship. And the Doctor, IIRC, benefited from upgrades during the show, having been more limited I scope initially.
In any case, Trek isn't super internally consistent, anyhow.
Maybe I'm misremembering, but I distinctly remember that basically everyone ever shown interacting with a replicator follows the generic->specific parameter hierarchy.
Well, to extend the GUI/console metaphor, it means that at some point soon, we'll all be using NLP because it's dramatically more user-friendly for the vast majority of people.
GUIs won over console workflows because GUIs have better discoverability and the "recall vs recognize" difference; it's mentally much easier to recognize the option you want when presented it than to recall the existence or the naming of that option.
In those aspects of UX, voice interfaces have the same drawbacks as console apps when compared to a good GUI.
Also, they have to work within the "bandwidth bottleneck" of audio - just imagine a phone system that tells you all the options you have, "Press 1 for something, Press 2 for another thing..." - they are so annoying because they are slow and inherently linear; a GUI can show the same options all at once, and you can read it much faster than listen to them.
So NLP as such is not dramatically more user-friendly unless it is at the "do what I mean" level which likely requries full human-level general artificial intelligence; before that it's just a voice equivalent of a console app, sharing all the problems of discoverability and needing to remember what the system can do and how should it be invoked.
> Also, they have to work within the "bandwidth bottleneck" of audio - just imagine a phone system that tells you all the options you have, "Press 1 for something, Press 2 for another thing..." - they are so annoying because they are slow and inherently linear
They're even slower now because the brain trust decided adding voice control to the phone menu system was a great idea. So before, it said "For prescription refills, press 1." Now I have to wait for "For prescription refills, press 1, or say prescription refills." How on earth does that improve anything? I can just as easily press 1 as I can say a word, and when I press 1, there is a near 100% chance that the computer on the other end will understand my command.
Some phone menu voice automation are even worse. "Tell me what you want! <silence>" Then you say something, and it says "I didn't recognize that. Please tell me what you want!" Then it fails again and says "I didn't recognize that. For prescription refills, press 1, or say prescription refills..." Oh great so there was a menu? Why did you waste my time earlier?
Voice is just a terrible, low-fidelity, low-bandwidth way of commanding a computer. You might as well have handwriting input while you're at it: You write what you want on a piece of paper, and hold it up to the camera and the computer tries to figure out what you wrote. Just as silly.
> I can just as easily press 1 as I can say a word, and when I press 1, there is a near 100% chance that the computer on the other end will understand my command.
So, I would rarely say something when I could push a button, but when using a smartphone on a call, it's not always easy or obvious how to push a button. Some people may have mobility issues making it hard to push a button, or be on speaker phone far away from the buttons. Or, maybe they haven't updated their telephone equipment in 50 years, and only have a rotary dial. Or, maybe on a terrible VoIP system that can't manage to get the tones through.
There's probably some way to clean up the script.
"(Please listen carefully, as our options have changed.) Please choose from the following options: Say prescription refills or press 1; say insurance denied or press 2; say referral to veterinary care or press 3"
I could get behind voice interfaces for more things if the commands words were documented, clear, and consistent, and the damn things worked. Until then, buttons seem good to me.
Honestly, "Tell me what you want!" is better than a system that forces you to listen to all of the options since "representative" is what I want 99% of the time when I have exhausted all other options and decided to do battle with an automated phone system.
I understand your point, but I'm not sure that GUIs won out because they were dramatically more user-friendly. It certainly helped, but I think they won because it made multitasking possible. Multitasking from the users perspective that is, the ability to interact with more than one application at the same time. That was just not possible on a console, so even people who didn't need user friendliness were able to do things they couldn't do before. I was young at the time, but that's how I remember it at least.
That's just not true, though. As with many console things, multi-tasking is totally possible but it's discoverability is terrible. Ctrl+z and `jobs` is the entry level, with tmux being the end-state reached via gnu screen. This lack of discoverability is the same problem voice assistants have only moreso; no `apropos` and no tab completion.
GUIs are for discovery, CLIs are for power via composability, voice/NLP assistants are for convenience.
Was it? Even if you discard stuff like tmux as already a GUI, you can still send whatever is running at the moment to the background with CTRL-Z and typing "bg" on any modern Unix system. "jobs" will then list all your processes, and "fg <ID>" will bring it to the foreground. I am sure this functionality predates most modern GUIs.
Aside from the usability POV, GUI provided significantly more features to the user, such as visualization of information and data. Images, Audio, Video, Multi-Media, 3D/2D Video games. You could have more information on the screen and at your fingertips at the same time. You can load many of these things from the CLI, but it's not as convenient as within a GUI.
You may be thinking of DOS, which yes had almost no multitasking ability available.
However there were multiple timesharing operating systems that existed before the PC and GUIs, Unix being the most famous and still around.
Multitasking is quite possible on a Linux console for example. It has 5 or more consoles, each handling different users, each being able to be split via screen/tmux. Each shell can run jobs in the background as well.
From my observations, people have reduced their voice assistants to objects that sometimes tell them the weather or switch their lights on and off, and sometimes do something completely unrelated when activated.
I sometimes feel that the trend to prefer NLP over formal query languages is comparable to the trend to prefer GUIs over consoles in the 80ies and 90ies.