As an early pioneer in CD recognition, Gracenote today is probably best known for its MusicID software which can automatically and rapidly identify songs and return metadata such as artist name, track name, and album cover art. Running in hundreds of millions of car infotainment systems, sound systems, laptops, smartphones, and other devices throughout the world, MusicID resolves over 20 billion queries every month using its 200 million+ track reference database. To identify music, MusicID uses audio fingerprints — compact and unique digital song identifiers. Even with static, noise, and other audio interference, fingerprints allow for fast and accurate music recognition.
Archives for maart 2017
Identifying Live Music
While MusicID and StreamFP are fast and accurate audio fingerprinting systems, they can only identify known recordings and will not work with alternate versions such as live song recordings. Even from a known artist, a live performance typically exhibits audio variations such as changes in key (e.g., the artist cannot sing as high as she/he used to), tempo (e.g., the band plays faster than usual), or instrumentation (e.g., an acoustic guitar replaces an electric one).
To solve this problem, the music team within the Applied Research group at Gracenote developed a new recognition system which not only compensates for audio interference but can also handle audio variations such as the ones described above. Dubbed LiveID, this new recognition system was initially proposed for a scenario in which a user attending a known artist’s live performance wanted to quickly identify a song using a smartphone.
The sample, in this case, would be compared against the artist’s existing recordings stored in a database, similar to how a traditional audio fingerprinting system works. Early tests on live queries extracted from live albums and smartphone videos showed that the system can achieve high accuracy, even in the presence of large tempo variations (e.g., up to 20% for Bonobo for live album queries), and key variations (e.g., up to 5 semitones for Foreigner). Poor results are typically due to considerable audio variations (e.g., Jefferson Airplane’s extensive improvisations) or audio interference (e.g., a lot of noise for Suprême NTM for queries from smartphone videos). For more details about the system and its evaluation, I refer the reader to the following article:
Zafar Rafii, Bob Coover, and Jinyu Han, “An Audio Fingerprinting System for Live Version Identification using Image Processing Techniques,” in 39th IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, May 4-9, 2014.
New Developments in Recognition
While LiveID was originally developed to rapidly identify short and noisy song excerpts using a smartphone recording a known artist’s live performance, the system can also be used to identify full recordings of live or cover versions of songs from sources such as YouTube or SoundCloud. The ability to monitor these sources is particularly useful to artists and labels for rights management.
This “cover music identification system” computes audio fingerprints using the LiveID algorithm from successive segments of a given duration from a full recording (e.g., a cover), for example, downloaded from YouTube, and compares them to a given artist’s reference database to identify the song (or songs) being played. A post-processing step removes unlikely candidates (for example, isolated or inconsistent matches) resulting in more accurate identification.
Tests on audio recordings extracted from YouTube videos showed that Gracenote’s system can accurately identify live and cover versions of diverse artists such as Eminem, Katy Perry, Maroon 5, or Taylor Swift, even when the live recording is fairly bad (e.g., a cellphone recording) or when the cover is fairly different (e.g., an acoustic recording). Since this system identifies the recording for every segment, it can also identify multiple references within the same recording (e.g., a full concert). Additionally, a separate cover music recognition system based on the same LiveID audio fingerprint was developed and tested, showing state-of-the-art results on a recent cover song dataset. For more on this topic, I refer the reader to this soon-to-be-published article:
Prem Seetharaman and Zafar Rafii, “Cover Song Identification with 2d Fourier Transform Sequences,” in 42nd IEEE International Conference on Acoustics, Speech and Signal Processing in New Orleans, USA, March 5-9, 2017.
At Gracenote, we thrive on solving big challenges involving digital media by developing new, technology and data-based solutions. LiveID which did not exist at this time last year is just the latest example of an algorithm we’ve created which has immediate practical applications. If you have anything to say on this topic, sound off below. Otherwise, keep your eyes on this blog for more from our tech team.