Signal Processing Asked by signaler on December 7, 2020
I am looking for a way to take a .wav file and extract features from it that could be used for further classification (the features could be MFCC statistics, an autoencoder representation, etc.). I know it is likely to be task dependent, but I am looking for some standard set of features that make sense. I am trying to categorize an audio signal (mostly music) into a set of categories such as “quiet music”, “thrilling music”, etc.
I was wondering if there is existing software (preferably for Linux) that takes such audio signal (such as in a .wav format) and outputs a feature fingerprint.
An audio embedding from a deep neural network is a good candidate for representation audio for tasks such as classification, similarity metrics, semantic search etc.
OpenL3 is one good implementation. It is implemented in Python and provides a simple API. It also has a commandline tool for extracting the embeddings, saved as Numpy files.
Answered by jonnor on December 7, 2020
A relatively easy way to do that would be to write MATLAB/GNU Octave/Python scripts for data extraction (the latter two being open-source). A major advantage is that a lot of the work can be done automatically, provided you script it. Depending on what you would like to do, a lot of features may exist in form of built-in functions.
While GUI software for this application may exist as well, I think that writing your own scripts is worth your while because of repeatability and you being in control of every step in the extraction chain. This makes it easy for you slightly alter features, for example changing the standard bark-band definition so that the two lowest bands are split into two bands each, which I've seen in a couple of audio classification publications.
Another advantage is that for further classification algorithms written by yourself, you can easily connect them to your code.
Answered by Jonas Schwarz on December 7, 2020
The openSMILE audio feature extraction toolkit may be able to provide the functionality you desire, where the input is a .wav file and the output extracted audio features. See: http://audeering.com/technology/opensmile/
openSMILE provides a command line executable that is coupled with a configuration file that defines the features to be extracted. The executable has binaries for Linux and Windows (32 bit), and I've also built the executable from source for use on macOS. It provides its own objects that calculate features such as MFCCs, energy, pitch, but also "chroma" musical features (which may help with music classification). It comes with many prebuilt feature configurations that have been used for audio classification tasks, such as the Audio Visual Emotion Challenge (http://sspnet.eu/avec2017/).
Below is an example of how openSMILE can be used to extract features from a single .wav file and a chosen configuration. The output.arff
is a format used by the Weka machine learning library.
SMILExtract -C config/emobase.conf -I input.wav -O output.arff
Custom configurations can also be written using the openSMILE configuration language and also the extensive PDF documentation included with openSMILE's download.
This has the desired functionality as described in the original question:
looking for a way to take a .wav file and extract features from it that could be used for further classification
Answered by English Student on December 7, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP