Dimensionality reduction for visualization purposes - "Sound map"

Question

I'd like to know how recordings of many various sounds can be analyzed to allow for visualizations in two dimensions.
My idea would be to find two data features (e.g. using principal component analysis) that make every sound class (dog bark, baby cry, etc.) distinguishable from others.
I'm struggling to understand what parameters to focus on and which method to use.
Thanks for every comment.

jojek · Answer

For dimensionality reduction you need features to start with. You can for example extract MFCC’s or some other low-level features such as MPEG-7 descriptors. Then you can visualise them using PCA. TBH for this task you might be better of using t-SNE or UMAP to project this high dimensional data while preserving local clusters.
Lastly, just have a look at YAMNet or VGGish models, which are already suited for SED task. You can extract embeddings and treat them as features for your visualisation.

Dimensionality reduction for visualization purposes - "Sound map"

One Answer

Add your own answers!

Ask a Question