Maximum lossless compression ratio for floating point time series

Question

I want to compress an array of time series floating point data as much as possible.
Currently the only algorithm I've found for this is XOR compression which works well, but doesn't compress the data enough. What is the highest compression ratio algorithm for relatively smooth time series floating point (double precision) data?
I'm optimizing for compression ratio only, so the runtime complexity of the algorithm doesn't matter (within reason).
EDIT: The compression needs to be lossless.

Daniel Shapero · Answer

There is a very battle-tested library for this called fpzip, which has both lossless and lossy compression.
There's a paper by the authors about their approach (here's a link without a paywall too).
If you look at table 1 in their paper, they get compression ratios on the order of 100 for some simulation outputs, but as low as ~1.3 on others.
Clearly the results depend very strongly on the nature of the data you're compressing.
For some fields, like the velocity in a fluid mechanics simulation, nearby values tend to have similar magnitudes, which makes it possible to compress the exponents of those nearby floating-point values.
I haven't used fpzip personally, but since your time series is relatively smooth I think you'd get good results.

Maximum lossless compression ratio for floating point time series

One Answer

Add your own answers!

Ask a Question