Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What you are saying is technologically correct, possible and prevalent but I couldn't find Amazon saying it anywhere on their page. Can you point me to the part where Amazon says nothing is ever stored/transmitted unless "Alexa" is spoken?


I noticed that your profile mentioned "recording engineer" so maybe some concrete numbers related to digital audio technology will put boundaries on plausible scenarios.

We assume either of 2 engineering designs:

(#1) the trigger word "Alexa" is detected within an embedded chip. The DSP (digital signal processing) intelligence for analyzing sound waveforms is inside the device. Therefore, the words spoken after "Alexa" are then sent to the cloud.

(#2) the trigger word "Alexa" (and/or other words) are detected remotely via cloud computers. There is no "smart" DSP chip within the Echo device. That means that the device must send a constant 24/7 stream of digital waveforms to the cloud.

If we continue on the #2 scenario, we can guesstimate what data transfer volumes would look like. To be conservative, we use 8kHz 8-bit audio as the parameters which is telephone quality. (Reliable voice recognition probably requires inputs with greater audio fidelity e.g. 16-bit 32kHz but we'll keep the 8kHz-8bit as a possible lower bound.)

Using 8kHz-8bit, it means that the device would have to stream 691 megabytes a day which leads to 20.7 gigabytes a month. Likewise on the back end, the amazon infrastructure would have to scale up to constantly analyze millions of parallel 24/7 digital waveforms. The amazon datacenters would be burning up terawatts of electricity to ignore the 99.99% of digital waveforms that is not the word "Alexa".

So, are there any consumer devices out there surreptitiously uploading 691 megabytes of digital waveforms (or any data) every single day? Is it realistic that Amazon would engineer the product to work like this?

I have a router that has a fallback option to a cellular connection in case my cable is disrupted. I and others would hate to get a surprise bill from Verizon/AT&T for going over my 2GB/month transfer limit if the amazon device was designed via scenario #2.

EDIT TO ADD scenario #3:

(#3) there are unpublicized/secret list of words in addition to the documented "Alexa" within the embedded chip's "vocabulary". Such words might be "vacation" and "book" and depending on the subsequent words sent to the cloud, you'd see ads for suntan lotion or Stephen King novels on your next visit to amazon.com. The chip's vocabulary may also include listening for transient sounds like dog barks or sneezes. You'd then get ads for dog food and cold medicine. In this scenario, a constant digital waveform is not uploaded 24/7 but extra trigger keywords unknown to the consumer causes more data to be sent than he/she agreed to.


I'm glad we're now discussing our assumptions about what Echo can/does do.

You present a scenario that I certainly did not imply, namely that Echo must be performing voice recognition in the cloud. Also, you make it out as though that is the conceivable alternative possible to on-chip voice recognition, from a privacy point of view.

Let me present another scenario to you - Echo keeps "listening" to all our conversations - on-chip of course - but creates additional metadata that is stored locally and uploaded to Amazon servers periodically.

What might theis metadata be?

- Audio streams that were close enough to Echo's threshold for "Alexa", but not quite, thus got rejected (perhaps some of them were falsely rejected, so let's keep a copy to feed our algorithm).

- Data on how often Echo heard voices in the house, from which rooms and at which times. Perhaps Amazon would like to know when a household wakes up, when it likes to listen to music or when to order groceries. Why should Google Now have all the fun?

I could give many more scenarious why Echo might want to retain some data from ambient conversations, so as to make itself more "useful". It needn't store the entire audio stream in these cases, but just metadata or logs.

Such a scenario falls outside your 1 vs. 2 design options; is plausible; useful; and fairly easy to program too. I'm sure there will be many others like that.

My point is - don't implictly trust a closed-source device that is inside your house and always listening in all directions. If Amazon were so careful about the Echo user's privacy, wouldn't they have mentioned the word at least once in the entire page? So let's not rush to give them a free pass till we know they even want it, much less earn it.

P.S. My profile says I'm a "recovering" engineer, not a "recording" one :)


>Such a scenario falls outside your 1 vs. 2 design options; is plausible; useful; and fairly easy to program too. I'm sure there will be many others like that.

Yes, I went back and added scenario #3... apparently at the same time you typed your reply. I think my scenario #3 is similar in spirit to what you're warning people about.

>P.S. My profile says I'm a "recovering" engineer, not a "recording" one :)

I have several browser tabs on music recording and I definitely had a dyslexic moment there.


Any decent voice-optimized codec (CELP, CELT, Speex, hell even old GSM)can squeeze that in 1Kbyte/sec - actually even half of that but let's retain some quality. Include silence detection and you probably have less than 60 minutes/day from the average household. And storage is cheap. Oh, and Amazon has lots. S3?


This reminds me of the (just as insane) concerns that people had about Microsoft's Xbox One Kinect being likened to a 1984 telescreen. I crunched some numbers like you just did - back when the One came with a Kinect and had to be online to work, the numbers worked out to something like exabytes of data that would be getting streamed to Microsoft, every single day.

You think the ISP's are cheesed off at Netflix? You haven't seen anything yet. The screaming from a non-trivial portion of their customers suddenly uploading multiple gigabytes of data per day would be deafening.

Sarcasm aside, anyone who thinks that this is seriously some kind of government listening device needs to up their medication. The number of insane assumption that have to be made for this to be plausible are:

* This is a listening device, live transmitting everything you say, when it would be more economical to listen for a codeword on chip. (Amazon is wasting money because they are not a corporate enterprise, and we all know how much companies love spending money they don't need to)

* That the data being transmitted is being stored for long term periods of time (Amazon is wasting money on storage when it makes more sense to just process commands)

* That that literally nobody actually notices the data stream going to Amazon servers when not in active use. (Not bloody likely)

* That ISPs will not flip their collective shit at the data usage should this catch on (Hello? Netflix? And that's a company whose business is transmitting large quantities of hard to compress data.)

* That customers won't notice this data usage when their next bill comes in or when their shitty connections get saturated by the upstream

* That the sorry state of connectivity in the USA (especially with regard to upload/download asymmetry) doesn't render the entire exercise meaningless from a surveillance standpoint even if we ignore every other point above

* That the outrage angle once these things that are never noticed are noticed wouldn't be played up in the media

Fucking. Seriously?

If I were a high level NSA guy, and this was the plan that was brought before me? I'd fire the guy for rank incompetence.


You do realise that it doesn't need to be streaming 48kHz 24 bit audio back up don't you? It could be something really low, like GSM which is 13.2 kbit/s. AMR is even lower! So to stream audio at the threshold where it is still legible, it doesn't need masses and masses of data as you presume.


They have advanced speech recognition but have never heard of compression? I would be surprised if the bandwidth consumed in plan #2 was even 1/3 of what you suggest especially in a non 24/7 sound environment like the typical home.


Given the state of the average American internet connection, is #2 even possible?


"prove this doesn't happen"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: