Data Science Asked by 3r1c on February 25, 2021
At the moment I have this piece of code which cuts a Spectogram into fixed length tensors:
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l[0][0]), n):
if(i+n < len(l[0][0])):
yield X_sample.narrow(2, i, n)
The following piece of code
for index, row in df.iterrows():
#resample
wave_form, sample_rate = torchaudio.load(row["path"], normalization=True)
downsample_resample = torchaudio.transforms.Resample(
sample_rate, downsample_rate, resampling_method='sinc_interpolation')
wav = downsample_resample(wave_form)
mel = torchaudio.transforms.MelSpectrogram(downsample_rate)(wav)
mellog = np.log(mel + 1e-9)
X_sample = speechpy.processing.cmvnw(mellog.squeeze(), win_size=301, variance_normalization=True)
X_sample = torch.tensor(X_sample).unsqueeze(0)
_min = min(np.amin(X_sample.numpy()),_min)
_max = max(np.amax(X_sample.numpy()),_max)
for chunked_X_sample in list(chunks(X_sample, max_total_context)):
print(len(chunked_X_sample[0][0]))
if len(chunked_X_sample[0][0]) == max_total_context:
X.append(chunked_X_sample)
y.append(row["y"])
My question: Is this the common way to create features for deep learning?
Do you have any suggestions to optimize this code?
Furthermore I’m not sure if it is right to split the melspectograms instead of splitting the audio earlier.
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP