How can I convert audio to a series of notes using pitch detection?

Signal Processing Asked by nagasgura on August 14, 2020

I have a good pitch detection system set up, and I would like to return a series of notes given an array of audio samples.

My current approach is as follows: I have a moving window across the audio signal, and I calculate the pitch of each window. Afterwards, I segment the audio into different notes by detecting the silent regions (i.e. where the pitch detector returns null). I then simply take the average of each note region.

Unfortunately, this has not been giving me such good results. The pitch detector does seem pretty accurate, but the issue is it doesn’t seem to segment the notes very well. It only really works when I leave a long pause between each note when I record the audio. I would like some way for it to detect a change in notes without having to rely on a large silent region.

Any ideas would be greatly appreciated!

3 Answers

Instead of looking for note decays into silence, you might try looking for high amplitude or delta amplitude attack sounds, and associate them with the following estimated pitch, if that pitch is detected shortly enough after.

You could have your pitch detector return, not only the estimated pitch, but some statistical probability or reliability factor of the pitch being a certain note in a given musical temperament (versus noise, or some other note, or halfway between notes, etc.). Then look in your pitch detection stream for when the probability values of two adjacent detected notes cross over.

If training an ML, you might look at the values returned by an inference vector of "note" bin weights, and compare ratios.

Answered by hotpaw2 on August 14, 2020

A simple approach would be an alternative mechanism for detecting silence that provides you with a more adequate response time. You could return null if your silence detector detects silence, else return the state of your pitch detector. If you are happy with your pitch detector, this has the benefit of not requiring modification to it.

Answered by Dan Szabo on August 14, 2020

This should work using a threshold on the by-bin-difference of the magnitude spectrum.

  • Calculate the abs fft for current window and normalize it.
  • Do a bin-wise difference with the normalized abs fft of the last window.
  • Sum up the bin-wise difference and compare it to a threshold.

Explanation: A change of note means a change of energy distribution over the fft bins. So the sum of bin-wise difference should be high, if a change of note has occured. Normalisation is to suppress false detection of note change due to change of volume.

Answered by Max on August 14, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP