Low bit rate codecs with acceptable quality will be a boon for VoIP over satellite. Satellite Internet service is both bandwidth-limited and hideously expensive-- Inmarsat Fleet Broadband for cargo ships, for instance, typically costs $6/MB. The idea would be to set up local phone networks that employ one of the GSM codecs or mu-law (allowing the use of plain vanilla handsets) and then transcode the voice to a more efficient codec like this one upstream/downstream of the satellite connection to the PSTN.
Incidentally, David Rowe is one of the pioneers of the Mesh Potato.
Thanks for the Mesh Potato mention. I haven't seen this project before but am very interested in stuff like this. My question is: what would happen if we overnight got an entire country wired? What kind of social, political and economic consequences would that have? I'm not thinking of the US, which already has a decent infrastructure, even if we want it to be better/faster/more neutral. I am thinking of African countries, a market largely underexposed to the internet. I suppose I can picture a startup trying to do this. Just think how many more Facebook users there could be...
I agree that VoIP does not currently work great over satellite. I also agree that satellite internet has a lot of room for improvement. However, I have to say it is a life saver if your only other option is dial up. I also know that the satellite internet companies are upgrading their network next year to increase bandwidth speed. So, there is definitely hope. There is more information on this topic on my blog at mybluedish.com/blog.
The samples sound quite good, suspiciously good even.
I'll give it a try on speech samples from other languages and speakers (this often makes quite a difference).
I've always wondered if some human languages work better with speech codecs than others, ever since I saw someone having a cell phone conversation in an Asian language without appearing confused or asking for clarification (based on facial expression, body language, and conversation pacing). My experience speaking English on cell phones is one of constantly repeating myself.
I don't know much about voice encoding, but I'm really curious as to why all the example files are the same size (46.9 KB). Could someone explain why this is an advancement if the file sizes remain the same?
I suspect it has something to do with all of them using wav as the container, but would love to hear from someone more knowledgeable.
The example files are produced by encoding and then decoding the original. In PCM 16bit raw format they will end up having the same uncompressed size. The encoded bitstream files will be a lot smaller.
For example: hts1a
original: 48000 bytes
encoded: 1050 bytes
Edit: Note that this is only the size of the bitstream written to disk. I didn't look into the actual format.
I can hear the 50Hz modulation caused by the 20ms frames in the Codec2 samples. It's particularly problematic on the sibilants from the female sample. Codec2 seems to have a wider frequency response, but MELP is more intelligible.
That said, it sounds like a great version 0.1, and look forward to hearing what comes in the future.
I'm still trying to understand the processing involved, but it could be interesting if one could use it to effectively do voip calls over EDGE using this - may require having an external dsp dongle though.
Incidentally, David Rowe is one of the pioneers of the Mesh Potato.