Hacker Newsnew | past | comments | ask | show | jobs | submit | bshillingford's commentslogin

Standard neural network-based speech recognition pipelines (i.e. RNN + CTC) always use a language model. Unlike a seq2seq model (or any autoregressive model, or a structured prediction output), CTC models output timesteps as conditionally independent. Hence, everyone uses an RNN LM or n-gram LM or both when retrieving probable sequences from a CTC model (e.g. with beam search).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: