I'm quite surprised to find this on HN, synthesizers like espeak and eloquence (ibm TTS) have fallen out of favor these days. I'm a blind person who uses espeak on all my devices except my macbook, where unfortunately I can't install the speech synthesizer because it apparently only supports MacOS 13 (installing the library itself works fine though).
Most times I try to use modern "natural-sounding" voices they take a while to initialize, and when you speed them at a certain point the words mix together into meaningless noise, while at the same rate eloquence and espeak would handle just great, well, for me at least.
I was thinking about this a few days back while I was trying out piper-tts [0] how supposedly "more advanced" synthesizers powered by AI use up more ram and cpu and disk space to deliver a voice which doesn't sound much better than something like RH voice and gets things like inflection wrong. And that's the english voice, the voice for my language (serbian) makes espeak sound human and according to piper-tts it's "medium".
Funny story about synthesizers taking a while to initialize, there's a local IT company here that specializes in speech synthesis and their voices take so long to load they had to say "<company> Mary is initializing..." whenever you start your screen reader or such. Was annoying but in a fun way. Their newer Serbian voices also have this "feature" where they try to pronounce some english words it comes upon properly. It also has another "feature" where it tries to pronounce words right that were spelled without accent marks or such, and like with most of these kinds of "features" they combine badly and hilariously. For example if you asked them to pronounce "topic" it would pronounce it as "topich, which was fun while browsing forums or such.
Most times I try to use modern "natural-sounding" voices they take a while to initialize, and when you speed them at a certain point the words mix together into meaningless noise, while at the same rate eloquence and espeak would handle just great, well, for me at least.
I was thinking about this a few days back while I was trying out piper-tts [0] how supposedly "more advanced" synthesizers powered by AI use up more ram and cpu and disk space to deliver a voice which doesn't sound much better than something like RH voice and gets things like inflection wrong. And that's the english voice, the voice for my language (serbian) makes espeak sound human and according to piper-tts it's "medium".
Funny story about synthesizers taking a while to initialize, there's a local IT company here that specializes in speech synthesis and their voices take so long to load they had to say "<company> Mary is initializing..." whenever you start your screen reader or such. Was annoying but in a fun way. Their newer Serbian voices also have this "feature" where they try to pronounce some english words it comes upon properly. It also has another "feature" where it tries to pronounce words right that were spelled without accent marks or such, and like with most of these kinds of "features" they combine badly and hilariously. For example if you asked them to pronounce "topic" it would pronounce it as "topich, which was fun while browsing forums or such.
[0] https://github.com/rhasspy/piper