Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The Mozilla Common Voice dataset is awesome - however it's useful the opposite purpose - speech-to-text. This is because it is a lot of different people using a range of hardware, speaking similar phrases.

For good text-to-speech you need 1 person speaking different phrases but very consistently. Here's an example dataset from Thorsten a German open voice enthusiast: https://openslr.org/95/



Thanks for the explanation!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: