TransWikia.com

What are the audio features to best describe a music?

Data Science Asked by Sacha Perin on January 29, 2021

I’m working on the content-based filtering part of a recommender system for an audio streaming project.

I firstly used the k-mean algorithm with music genres and one-hot encoding to classify musics into different groups.
But, in order to get more precise results I want to change it and use audio features to feed the model instead.

So my questions are:
– is my approach correct.
– what are the most relevant audio features I can extract from an audio file.

Thx for your answers.

edit:
right now, I’m extracting those features:

  • music tempo
  • zero-crossing rate
  • duration
  • spectral centroids
  • spectral roll-off
  • MFCC
  • spectral bandwidth
  • spectral contrast

I want to know at which degree those audio features are relevant to ‘describe’ an audio extract.

2 Answers

Every cepstral coefficients can be considered as one of the best features for defining a musical piece.

Most famous being the Mel Scale, as I can see you are already extracting MFCCs, you are good to go. Although you should have mentioned, which MFCC are you extracting, from experience (a little bit) first 15 are usually the most useful cause they have a positive value. You can also work on more intrinsically robust GFCC.

They can then be used to obtain Spectrograms and so on.

I would suggest not using kNN, prefer Random Forrest, moreover audio signals require a lot of preprocessing. DCT and STFT are a must.

Answered by Astrian_72954 on January 29, 2021

The features you have selected are a good starting point, but are still (with the exception of tempo) quite "low level" compared to what might be most relevant for music recommendation systems.

The Essentia project provides feature extractors for music, that cover both low-level, medium-level and (since Jan 2020) high-level music feature descriptors. Their high-level descriptors include:

  • musical genre
  • ballroom music classification
  • moods: happy, sad, aggressive, relaxed, acoustic, electronic, party
  • western / non-western music
  • tonal / atonal
  • danceability
  • voice / instrumental
  • gender (male, female singer)
  • timbre (dark, bright)

The medium and low-level descriptors cover all what you mention, and more. This is packaged into a command-line tool that outputs JSON.

Answered by jonnor on January 29, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP