Signal Processing Asked by Kevin Sullivan on December 4, 2021
Recently, I have been studying IIR and FIR filters, and trying to create a parametric equalizer using a microcontroller. More specifically, I use the ADC on it to sample the audio at about 44 kHz, and in between grabbing samples, I use the time to process the signal and pass it out on the DAC.
What is a bit confusing to me though is that this process seems to be pretty intense, and often takes most of the time in between samples to do all the processing (with 8 2-pole IIR bandpass filters).
Programs like Spotify, or devices like most Galaxy phones have similar equalizers but they are running on non-dedicated hardware, and I am really confused how they are able to pull this off? Does anybody know if they are basically doing the same thing just faster, or is there some secret sauce to this that I am missing?
I have never tested either of the previously mentioned equalizers in those platforms, so maybe they are just really crappy, but at least just by listening to them, it doesn’t sound horrible.
This depends a lot on how you implement it.
Let's a run a simple example: A well coded biquad on a decent ARM core should take about 15 cycles. For 8 biquads running at 44.1 kHz on a stereo signal (two channels) that would be just about 10M cycles/second. That's about 1% of a single core at 1 GHz or 0.25% of the entire chip. On my Snapdragon 835 it would be less than 0.03%
Answered by Hilmar on December 4, 2021
I don’t think you’re alone, but essentially this is simply a problem of optimization. Let’s say you have a processor with a 88MHz clock. That’s 2k clocks per sample at 44kHz. If we take the term ‘most’ to mean 50%, of the clocks, then that leaves 1k clocks per sample for filtering. Running 8 filters leaves 125 clocks per filter. That’s a decent amount of time, but data needs to get moved around, math needs to get done. So now we need to figure out how to minimize the the number of clocks per filter.
For one, you can process multiple samples at a time. This will reduce the amount of time spent moving data around the memory to the registers. Things like loading coefficients can be drastically reduced this way, and there are likely gains to made with the input and output streams as well.
For two, you can leverage the capabilities of the processor. If it doesn’t have a floating-point unit, use fixed point math. If it has multiply-accumulate instructions, make sure they’re used. Just basically make sure your assembly output doesn’t have a bunch of crazy stuff going on.
Lastly, you can use alternative filter topologies. Direct/transposed direct forms I or II. Maybe look at different structures like lattice, parallel, state-variable. A lot of people more clever than us have spent a lot of time on this stuff, and there’s some interesting literature on the subject. However, I’d generally recommend starting with the first two ideas. Different topologies can have limitations that are tricky to understand.
Answered by Dan Szabo on December 4, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP