Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

While impressive, the paramount question stands: Why do we even need "emotional" voices?

All that emotionality adds is that you get the illusion of a friend - a friend that can't help you in any way in the real world and who's confidentiality is as strong as the privacy policies & data security of the company running it - which often ultimately trends towards 0.

Smart Neutral Voice Assistants could be a great help, but none of it requires "emotionality" and trying to build a "human connection" with the user. Quite the contrary: the more emotional a voice, the easier it is to misuse it for scams, faking rapport and in general make you "addicted" to loop you in babble with it.



When OpenAI released voice mode originally, I got early access. I used it a __ton__. I must have been 99.9th percentile of usage at least.

Then they started updating it. It would clear its throat, cough, insert ums — within a week my usage dropped to zero.

To me emotionality is an anti feature from a voice assistant. I’m very well aware I’m talking to a robot. Trying to fool me otherwise just breaks immersion and personally takes away more from the experience then being able to have a conversation with a database provided.

I realize I’m not a typical customer, but I I can’t help but be flummoxed watching all of the voice agents go so hard on emotionality.


Emotions convey a ton of meaning in human communications, not necessarily an illusion of friendship. It's a huge side channel and there's a clear use case for an assistant to not sound lifeless and robotic. Scams, addictions, privacy loss and many other things deviating from the idealistic sci-fi portrayals will stay regardless of the tech if not treated on the cultural level (which is way harder to do and nobody likes doing it, preferring to shift the responsibility onto someone else).


Can't say I've missed emotions in Google Search or Excel. In chat from something designed to help you, there's a fairly narrow range of emotional cases that are relevant and useful:

- Confidence/confusion: if the bot thinks it misheard or cannot understand you or it lacks confidence in the ability to reliably respond then it's a handy channel

- Dangerous/Seriousness: an update for something genuinely serious, with major negative implications or costs

Most others are fairly annoying (would anyone want a bot to surface frustration or obsequiousness or being overly agreeable / "bubbly" as here?!)


You answered the question yourself - "faking rapport and in general make you 'addicted' to loop you in babble with it."

Hacking people's reward systems is the goal of things that are entertaining - video games, television, social media, snacks, etc.


Can already see this in the hordes of lonely dudes using the AI girlfriend apps on the app stores…can’t imagine how hooked people are gonna get when it actually sounds and talks like a real person. The chatbots now are so limited idk how anyone enjoys them.


The same reason why text LLMs show exaggerated emotions (enthusiasm about your questions, super-apologetic tone when you dislike the answer, etc).

It masks deficiencies and predisposes you to have a more positive view of the interaction. Think of the most realistic and immediate ways to monetize this tech. It's customer support. Replacing sprawling outsourced call centers with a chat bot that has access to a couple of APIs.

These bots often interact with people who are in some sort of distress. Missed flight, can't access bank account, internet not working. A "friendly" and "empathetic" chatbot will get higher marks.


Has it been tried the other way? I don't remember an iteration where they weren't obnoxiously over-endearing. After the initial novelty, it would be better to reduce the amount of fake information you have to read, and any attempt at pretending to be a human is completely fake information at this point.


You can always tell it to respond critically and it will. In fact, I've been doing this for quite a few queries after getting the bubbly endearing first pass, and it really strips the veil away (and often makes things more actionable)


Yes, there are many use cases where emotional voices are not needed, but that's not the point.

The core is not to have emotional voices, but to train neural networks to emulate emotions (not just for voices). Humans are very emotional beings, and if you want to communicate with them effectively, you will need the emotional layer. Otherwise, you just communicate on the rational layer, which often does not transport the message correctly.

Think of humans as 20% rational and 80% emotional.

And I say that as a person who believed for a long time that I was 80% rational and just 20% emotional ;-)


But there is no message outside the rational layer when you're talking to a non-human. The only message is the amount of true information the LLM is able to output - the rest is randomness. It's fatiguing to have your human brain try to interpret emotions and social dynamics where they don't exist, the same way it's fatiguing to try and interpret meaning from a generated image.


I am sure that if you talk to a dog, it will probably take as much from your emotions as your words (to disprove your point about non-humans).

You look at it in binary categories, but instead, it is always some amount of information and some amount of randomness. An LLM can predict emotions similarly to words. Emotions and social dynamics from an LLM are as valid as the words it speaks. Most of the time, they are correct, but sometimes they are not.

The real difference is that LLMs can be trained to cope with emotions much better ;-)


Yes, fair enough about the dog - "non-human" was the wrong choice of words. But I don't agree that emotions and social dynamics from an LLM are valid. Emotions need real stakes behind them. They communicate the inner state of another being. If that inner state does not exist (maybe it could in an AGI, but I don't believe it could in an LLM), then I'd say the communication is utterly meaningless.


> communication is utterly meaningless.

Well, at least to some extent. I mean, changing the inner state of an AI (as they are being built today) certainly is, because it does not affect other beings. However, the interaction might change your inner state. Like looking at an AI-generated image and finding it beautiful or awful. Similarly, talking to Miles or Maya might let you feel certain emotions.

I think that part can be very meaningful, but I also agree that current AI is built to not carry its emotional state into the world outside of the direct interaction.


To accurately imitate human speech?

You could type something, and it could be read like a human.

There are plenty of other reasons, but they're equally as obvious. I don't understand what purpose you have in attempting to make this point.


Different things: You are describing voice narration or TTS use cases. My comment was regarding "emotional chatbots" that are imitating to have a genuine connection with their users.


The funny part is that no one would be arguing like they do in these forums if they were talking face-to-face with conveying things like “emotion”


This was the first thing I asked it. It was like, "Dark-patterns? Me? what?"


one thing: language learning


When I meet people in VR who are ESL, I can tell based on their accent and mannerisms that they learned English by playing video games with westerners or watched a lot of YouTube.

Do we really want to dilute the uniqueness of language by making everyone sound like they came out of a lab in California?


Why would that be? In Elevenlabs Reader I can already choose a bunch of different accents, including southern English, Australian and so on.

The people behind this demo already said their publishing different languages and accents soon along with open models you can run yourself.


>Do we really want to dilute the uniqueness of language

I can't speak to whether it's desirable or not, but this has been happening with the advent of radio, movies, and television for over a century. So, are we worse off now, linguistically-speaking, than then? Do we really even notice missing accents if we never grew up with them?


good points.


Your post is the language learning equivalent of worrying that going to the gym will make you too bulky.


haha yeah it definitely comes off grand schemey and overly idealistic but it’s hard not to have emotional reactions to new applications in AI


Likewise will you be learning how to speak formally or informally.

Getting that wrong in some languages e.g. Korean can be offensive.


language learning also works fine without emotionality faking, and is depending more on authentic speech recognition (e.g. you want the model to notice if you mispronounce important words, not gloss over it and just continue babble as otherwise this will bite you in the ass in the real world) as well as the system's overall specific ability to generate a personal learning curriculum.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: