I got your fork working (also on a 3090). I was not impressed with the latency o... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		deckar01 15 days ago \| parent \| context \| favorite \| on: Building voice agents with Nvidia open models I got your fork working (also on a 3090). I was not impressed with the latency or the recommended LLM’s quality.

nsbk 14 days ago [–]

Make sure you’re using the nemotron-speech asr model. I added support for Spanish via Canary models but these have like 10x the latency: 160ms on nemotron-speech vs 1.5s canary.

For the LLM I’m currently using Mistral-Small-3.2-24B-Instruct instead of Nemotron 3 and it works well for my use case

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact