Computer Science Asked by Wout12345 on December 8, 2020
I’m extending a framework for lossy compression of multidimensional floating-point data. At some point in the pipeline, sequences of symbols from a non-uniform distribution are losslessly compressed using entropy coding. The existing software used an implementation of arithmetic coding based on this post. As an extension, I implemented a second option, rANS.
However, analyzing the results, it seems that rANS needs around 1-2% more bits to encode the same data and this pattern is consistent among many different sequences of symbols of different lengths. Yet, as far as I understand, they should both be extremely close to the entropy limit. Indeed, looking at empirical results the arithmetic coder seems to only have a constant overhead of around 32 bits (the size of its normalized state) compared to the entropy limit.
So, my question is: Is there a known phenomenon that consistently makes certain variants of rANS slightly less efficient than arithmetic coding?
Some more information that might be useful:
If this is caused by a basic error in my implementation, I apologize and will take this question elsewhere (which is also why I didn’t post the relevant code here).
Update: I have now also made b a variable (the ratio in between the upper and lower bound of the valid state interval, previously hard-coded to 2). Strangely enough, increasing b from 2 to higher powers of 2 seems to have mostly removed the overhead. With b = 256 I now get ~0.25% overhead typically. However, this doesn’t really make much sense to me either, since I thought increasing b improved speed (by allowing the encoder to work with bytes or groups of bytes) while decreasing efficiency.
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP