Sound Source Separation is the process of isolating individual sounds in a mixture of multiple sounds. We call each sound heard in the mixture a source.
For example, we might want to isolate a singer from the background music to make a karaoke version of a song or isolate the bass guitar from the rest of the band so a musician can learn the part.
There are many reasons to study source separation. There are many demonstrated uses for music source separation within the field of Music Information Retrieval (MIR). In many scenarios, researchers have discovered that it is easier to process isolated sources than mixtures of those sources. For example, sound source separation has been used to enhance:
automatic music transcription
lyric and music alignment
musical instrument detection
lyric recognition
automatic singer identification
vocal activity detection
fundamental frequency estimation
understanding the predictions of black-box audio models
Additionally, source separation has long been seen as an inherently worthwhile endeavor on its own merits, with many thousands of research papers appearing over the past few decades and more appearing every year.
Deep Neural Networks are currently the state-of-the-art source separation technology. In a nutshell, they work by training on a large amount of mixtures and isolated source data. The network produces an output for a source, then the network's output is compared against known, ground truth isolated source.
MUSDB18 dataset can be used for the training purposes of Sound Source Separation neural networks.
MUSDB18 is a dataset of 150 full length music tracks of varying genres. For each track it provides a mixture along with the isolated stems for the drums, bass, vocals, and others. As its name suggests, the "others" stem contains all other sources in the mix that are not the drums, bass or vocals.
The goal of this article was to provide an introduction on Sound Source Separation. We also discussed the importance of Sound Source Separation and the preferred Machine Learning model for training. We also discussed one of the required dataset we can use to train our Sound Source Separation Deep Learning model.
Comments