Normalisation of features extracted from audio files

Data Science Asked on February 23, 2021

I am building CNN and SVM models which take in MFCC features as input. The MFCC matrices shape is (13, n). The 13 rows are coefficients and n columns represent n time frames. So each row in the matrix is a representation of the value of the particular coefficient over different time frames. I am not sure how to normalise this matrix. Should it be rowwise (normalize a single coefficient over all the time frames) or columnwise (normalize all the coefficients in a single time frame).

[
  [1.08,  8.97, 78.7, ........],
  [1.08,  8.97, 78.7, ........],
  [1.08,  8.97, 78.7, ........],
   .
   .
   .
  [19.8,  7.65, 76.5, ........]
]

Currently I am using Normalize from sklearn, but I am not sure if its the right thing to do.

audio recognition feature engineering normalization scikit learn

Add your own answers!

Ask a Question

Get help from others!