Shazam used to wow me, but then as others mentioned in the replies it's essentially matching the signature of the sound to the sounds in the database. If it's one of the song, it gets matched fairly quickly.
It works so well even with my shitty humming - even my girlfriend can't recognize what the song is but Google can. It doesn't even have the same signature as the original audio file, just similar hums in a noisy environment and it still works. Black magic fuckery.
> it's essentially matching the signature of the sound to the sounds in the database.
You aren't giving it enough credit. The algorithm uses just a few seconds from any part of the song, and has to deal with phone audio quality and often background noise. I mean, you can be in a bar with all that jabber and hold up the phone and it could pick out the song. The app on the phone does the preprocessing to the audio before it is sent to the server that does the matching ... using the comparatively miserable power of a 2001 era cell phone.
Oh that wasn't my intention - Shazam was and is groundbreaking, they did it when no one else could. All I meant was that it seems more "doable and I probably understand how it works" when compared to how Google assistant recognizes songs from my humming.
What is a signature? How is a signature computed from a noisy audio stream, over a mall speaker? How is a signature computed from an arbitrary starting point?
IIRC, it's uses a Fast Fourier Transform of the time delay between high notes in the song to generate a series of "hashes" that are stored a db. Those ids can be calculated locally on the phone and then its a simple db lookup to retrieve potential hits. When Shazam adds a song to the db, they compute a series of "hashes" so you can identify at any point in the tune.
Wow, that's fascinating! I just ended up down the rabbit hole reading Avery Wang's "An Industrial-Strength Audio Search Algorithm" (linked in this thread) - it's such a cool way of "fingerprinting" pieces of music data.
My original comment was from memory of reading a post about how it worked a few years ago. Looking at what you read, I think the gist of what said is right, though it seems they use a different algorithm than FFT.
Totally agree though. It is something that opened my mind to thinking of a way to solve that problem in a way that actually works. Shazam definitely looked like magic the first time I saw it work.
TL;DR (from skimming thru the paper) he figured that a song's spectrogram looks like a starry sky, so matching a song is like finding a constellation on the sky. How do you do it efficiently? By searching for simple features of your constellation, such as pairs or triples of bright stars - those can be pre-hashed to find matches instantaneously. Once a possible match is found, you compare the rest of the constellation. Nothing breathtaking, in other words. However, among all the men who talked, he was the one who both talked and did, and that's his achievement.
Brilliant stuff is easy to understand, a lot harder to come up with. I could do that! (With a little help from wikipedia, audio processing libraries, the answer sheet, and the knowledge that it's possible in the first place)
To me, this highlights how hashing is the closest thing programmers have to magic.
Create a compound signature. You don't just take one measurement but many measurements and then assess the probabilities. You may have people talking in a mall, but they will be in a narrow frequency band. Similarly you can analyze the repeating elements. Keep iterating and adding stuff until f(signal) performs well
> Wow blew my mind was when Google introduced 'hum and we'll recognize the song for you' in Google assistant
Their announcement actually made me roll my eyes a bit, as Soundhound had that functionality nearly a decade before. I had both SH and Shazam installed on my old phone for these usecases - now Shazam is baked into Siri so I don’t even have the app itself installed.
I haven’t tried humming with Shazam recently, but I don’t think it worked well back when I did have the actual app. It works very well for music though. I used it around five times, just this Wednesday night at a concert, and it got every track for me.
Soundhound is what had humming “support” explicitly in its product description, and it worked pretty well from what I remember. It’s been long enough though that I may only be remembering the times it worked.
If you prefer to access it via your iPhone's control center, you can configure it that way in the control center settings. It is called "Music Recognition" there.
Nice I will have to check this out, control center is definitely a great little overlay but I haven't reliably figured out how to add things to it. I will investigate further.
For general information have a look here [0]. Also be aware that elements in the control center might even offer additional functionality, e.g. like setting the brightness of the flashlight. In this case instead of just switching the flashlight on by a tap on the button, keep the flashlight button pressed to bring up a slider to set the brightness. Just play around with the other control center elements to find out what is possible.
> Essentially matching the signature of the sound to the sounds in the database.
And Dall-E 2 is just doing fuzzy hashing of images with text keys.
Shazam continues to amaze me because it "just works", and still feels more magical to me than most of the AI out there since it directly solve a major problem I didn't even think was solvable "what is this song!!?"
I enjoy salsa dancing, but I don't know any Spanish, so I use that built-in Google functionality to hum various songs all the time to figure out what they're called.
Wow blew my mind was when Google introduced 'hum and we'll recognize the song for you' in Google assistant: https://www.google.com/amp/s/blog.google/products/search/hum...
It works so well even with my shitty humming - even my girlfriend can't recognize what the song is but Google can. It doesn't even have the same signature as the original audio file, just similar hums in a noisy environment and it still works. Black magic fuckery.