What are the audio features to best describe a music?

Question

I'm working on the content-based filtering part of a recommender system for an audio streaming project.

I firstly used the k-mean algorithm with music genres and one-hot encoding to  classify musics into different groups.
But, in order to get more precise results I want to change it and use audio features to feed the model instead.

So my questions are:
 - is my approach correct.
 - what are the most relevant audio features I can extract from an audio file.

Thx for your answers.

edit:
right now, I'm extracting those features:

music tempo
zero-crossing rate
duration
spectral centroids
spectral roll-off
MFCC
spectral bandwidth
spectral contrast

I want to know at which degree those audio features are relevant to 'describe' an audio extract.

Astrian_72954 · Answer

Every cepstral coefficients can be considered as one of the best features for defining a musical piece.

Most famous being the Mel Scale, as I can see you are already extracting MFCCs, you are good to go. Although you should have mentioned, which MFCC are you extracting, from experience (a little bit) first 15 are usually the most useful cause they have a positive value. You can also work on more intrinsically robust GFCC.

They can then be used to obtain Spectrograms and so on.

I would suggest not using kNN, prefer Random Forrest, moreover audio signals require a lot of preprocessing. DCT and STFT are a must.

jonnor · Answer

The features you have selected are a good starting point, but are still (with the exception of tempo) quite "low level" compared to what might be most relevant for music recommendation systems.
The Essentia project provides feature extractors for music, that cover both low-level, medium-level and (since Jan 2020) high-level music feature descriptors. Their high-level descriptors include:

musical genre
ballroom music classification
moods: happy, sad, aggressive, relaxed, acoustic, electronic, party
western / non-western music
tonal / atonal
danceability
voice / instrumental
gender (male, female singer)
timbre (dark, bright)

The medium and low-level descriptors cover all what you mention, and more.
This is packaged into a command-line tool that outputs JSON.

What are the audio features to best describe a music?

2 Answers

Add your own answers!

Ask a Question