Data Science Asked on March 22, 2021
I am planning to classify two audio files in which different sentences are spoken. Don’t want to do speech to text as on prem speech to text conversion models are not good, and don’t want to go to cloud. So planned to use RAVDESS dataset, which is basically for emotion detection. There are two sentences spoken in RAVDESS dataset – 01 = "Kids are talking by the door", 02 = "Dogs are sitting by the door"
The approach I took is to convert label 0 and 1 for these two sentences audios, extract MFCC signals and then do a binary classification. However, I am not getting accuracy more than 70%. Can someone please let me know what can be the reason. I suspect any of following:
# Generate data for statement_type
import time,os
import librosa, numpy as np
# path = '/content/drive/My Drive/Ravdess/'
path = '/content/RAVDESS-emotions-speech-audio-only/Audio_Speech_Actors_01-24'
lst = []
start_time = time.time()
for subdir, dirs, files in os.walk(path):
for file in files:
# print(file)
try:
#Load librosa array, obtain mfcss, store the file and the mcss information in a new array
X, sample_rate = librosa.load(os.path.join(subdir,file), res_type='kaiser_fast')
mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T,axis=0)
# This is because our predictor needs to start from 0 otherwise it will try to predict also 0.
# file = int(file[7:8]) - 1
# file = int(file[13:14]) - 1
label = np.array([1,0] if int(file[13:14]) - 1 == 0 else [0,1])
# print(label)
arr = mfccs, label
lst.append(arr)
# If the file is not valid, skip it
except ValueError:
print("error at : " , file)
continue
# Model and training
model.add(Input(shape=(40,1)))
model.add(LSTM(512, activation="relu", return_sequences=True))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(2))
model.add(Activation('sigmoid'))
opt = keras.optimizers.Adam(lr=0.0001)
model.summary()
model.compile(loss=keras.losses.BinaryCrossentropy(), optimizer=opt, metrics=['accuracy'])
trainhistory=model.fit(x_traincnn, y_train, batch_size=16, epochs=300, validation_data=(x_testcnn, y_test))
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP