Data Science Asked on June 28, 2021
I have been looking for conversational audio speech separation. Looking for researches around this topic, I came to across asteroid framework that was implemented on LibriMix dataset for audio speech separation where two voices are overlapping on each other (You can listen to such a sample here). But when we talk about conversational speech data (e.g. two people are conversing with each other) the audio of those people do not overlap with each other. Instead, people take turns in speaking and speak (Here is a sample).
While looking online for research on audio separation for conversational audio, I wasn’t able to find one. Datasets like WSJ, LibriMix, etc do exist, but they are not designed for conversational problems and are more oriented towards the problem overlapping voices.
If there is any project, dataset or research around this that one knows of, it would be really helpful.
The task of separating a conversation with multiple speakers taking turns is called Speaker Diarization. This is different from Speech Separation, which is a special case of Source Separation, refers to separating out multiple overlapping/concurrent sources from an audio stream.
Some free datasets include
More can be found in the Github project wq2012/awesome-diarization.
Correct answer by Jon Nordby on June 28, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP