Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Thanks for commenting and the suggestion!

Indeed, the TED dataset has a lot of variability in terms of audio quality, etc. which, as you mentioned, with just 10 epochs of training is difficult to capture. I did try a larger network (up to 11 downsampling layers), but this proved even more time consuming to train (as expected). Thus, I split the difference and went with a network similar to yours but was trainable over a four-day period (10 epochs).



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: