Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just as amazing to me is that the algorithms could identify a song through the extremely limited bandwidth and spectrum of an early-2000s CDMA stream and a cheap Kyocera microphone.


Low bandwidth is perfectly suitable for low frequency data (ie melody). You lose some of the high frequency details (ie timbre), but it’s still very easy to recognize songs.

It’s the same as recognizing objects in a 256x256px image.

Try resampling a song from 44kHz to 4kHz and you’ll still have no trouble recognizing it.


To put some numbers on it a piano goes from A0 (27.5 Hz) to C8 (4186 Hz). Most vocalists and most instruments you are likely to hear will be somewhere in that range.

A 6 string guitar for instance goes from E2 (82 Hz) to E6 (1318 Hz) for 24 fret electric (classical guitars typically have 19 frets and go to B5 (988 Hz) and acoustic guitars have 20 or 21 frets so go to somewhere in between).

Popular singers with high notes are Mariah Carey who goes up to G7 (3136 Hz), Christine Aguilera who reaches C#7 (2217 Hz), and Prince who could hit B6 (1975 Hz) [1].

Plain old analog telephones and, I believe, early cell phones had a voice band of 300-3300 Hz.

They would have no trouble with most of the notes in the upper parts of the aforementioned ranges, except for the top 5 notes of a piano. As you note you'd change the timbre, but you'd still have the right notes.

Low notes might be a problem though. If you lose everything below 300 Hz that would cut out most of the left hand on a large majority of piano parts. On guitar it would cut all the notes that cannot be played on the first strings except for one.

That would change the notes. You'd lose the fundamental of a lot of notes just leaving the overtones, so it would look like the musicians played a higher note.

My guess is that when they were processing the song database to generate the hashes they put the songs through a bandpass filter the was smaller than the frequency range of the most limited device they supported listening on. Then when listening on any other device they could filter it down to that so those hashes would work.

[1] https://www.concerthotels.com/worlds-greatest-vocal-ranges


The UK was one of the first countries to introduce GSM-EFR which used the ACELP codec at 12.2 kbit/s for phone calls. The quality was actually pretty good.

I don't really understand why phone call fidelity hasn't improved since then. Sometimes it seems like it's even worse!


Imagine that audio fidelity is crucial. You are designing a phone. Does it resemble a hand-sized rectangular piece of glass?

No? I guess the hypothesis that audio fidelity is crucial was wrong.


GSM, not CDMA.


CDMA on Verizon and Sprint in the USA and Bell and Telus in Canada, at the time.


"> August 2002: Shazam launches as a text message service based in the UK."

AFAIR, we never had CDMA in the UK, so what Verizon et al. were using is irrelevant.


Yeah but we’re talking about UK here…. So GSM is correct.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: