I don't know much about voice encoding, but I'm really curious as to why all the example files are the same size (46.9 KB). Could someone explain why this is an advancement if the file sizes remain the same?
I suspect it has something to do with all of them using wav as the container, but would love to hear from someone more knowledgeable.
The example files are produced by encoding and then decoding the original. In PCM 16bit raw format they will end up having the same uncompressed size. The encoded bitstream files will be a lot smaller.
For example: hts1a
original: 48000 bytes
encoded: 1050 bytes
Edit: Note that this is only the size of the bitstream written to disk. I didn't look into the actual format.
I suspect it has something to do with all of them using wav as the container, but would love to hear from someone more knowledgeable.