Looking for research for separating conversational audio files

Question

I have been looking for conversational audio speech separation. Looking for researches around this topic, I came to across asteroid framework that was implemented on LibriMix dataset for audio speech separation where two voices are overlapping on each other (You can listen to such a sample here). But when we talk about conversational speech data (e.g. two people are conversing with each other) the audio of those people do not overlap with each other. Instead, people take turns in speaking and speak (Here is a sample).
While looking online for research on audio separation for conversational audio, I wasn't able to find one. Datasets like WSJ, LibriMix, etc do exist, but they are not designed for conversational problems and are more oriented towards the problem overlapping voices.
If there is any project, dataset or research around this that one knows of, it would be really helpful.

Jon Nordby · Accepted Answer

The task of separating a conversation with multiple speakers taking turns is called Speaker Diarization. This is different from Speech Separation, which is a special case of Source Separation, refers to separating out multiple overlapping/concurrent sources from an audio stream.
Some free datasets include

ICSI Meeting Corpus
AMI Meeting Corpus
VoxConverse

More can be found in the Github project wq2012/awesome-diarization.

Looking for research for separating conversational audio files

One Answer

Add your own answers!

Ask a Question