> if we want open, on-device voice recognition, we'll have to do the work and donate sample data.
We absolutely will not. The only reason people believe this is that they've forgotten how to do speaker-dependent recognition (SDR), which is more accurate and more secure anyway. We were doing SDR in the 80s with 1/1000 the CPU power and 1/1000 the memory.
SDR does require an initial training session, but once that's done any modern computer or smartphone should be able to handle it locally with no cloud server environment.
You say “forgotten” as if we had great tools everyone just forgot about. Having actually used those systems I am rather skeptical of that claim - they really seemed to have hit a certain functional plateau below the level of modern systems.
Put another way, if this was off the shelf, why isn’t anyone marketing it?
One reason may be that since it doesn't require a cloud, there's no personal data to mine. Try getting VC without a recurring revenue stream. It's probably possible but it's more difficult. Same story for IoT: Cloudless home automation is trivial from a technical point of view, but cloudless home automation is a non-starter VC-wise.
This was a field with multiple products on the market. How much VC do you need to deliver benchmarks of shipping software?
Similarly, saying cloudless home automation is easy sounds like you’re leaving out a lot of experience other people gained about the challenges of getting consumer adoption with the need to take on 24x7 server maintenance, connectivity challenges blocking popular features, etc. which made that class of products less appealing to most customers.
Training a speaker-specific recogniser that improves over a generic recogniser requires a lot more data nowadays. First, generic systems are a lot better and trained on a lot more data nowadays. Second, speaker adaptation worked better for the Gaussian mixture models from the late nineties (don’t know about the eighties) than for neural networks.
Who's "we" in this context? Because just below you, HN has comments from willing donors.
My point being that while there may still be a market for SDR, there's a broader market for speaker-independent recognition (SIR) simply because people want the tech to just work rather than feel like they messed up training the device when the device can't recognize them.
Using someone else's voice assistant is also a legitimate use case, especially if it's used to control music, lights, blinds, AC, car functionality ... that absolutely requires solid SIR.
I think this can be viewed as a marketing and UX problem, sort of. It reminds me of the Wii Amiibo - people actually paid money to train their AI bots because of how Nintendo designed them. Not sure how many people, but a reasonable enough segment of the market that Nintendo thought it a worthwhile investment anyway
We absolutely will not. The only reason people believe this is that they've forgotten how to do speaker-dependent recognition (SDR), which is more accurate and more secure anyway. We were doing SDR in the 80s with 1/1000 the CPU power and 1/1000 the memory.
SDR does require an initial training session, but once that's done any modern computer or smartphone should be able to handle it locally with no cloud server environment.