Ask Ubuntu Asked by Luis Alvarado on January 16, 2021
What compression tools are available in Ubuntu that can benefit from a multi-core CPU.
There are two main tools. lbzip2
and pbzip2
. They're essentially different implementations of bzip2 compressors. I've compared them (the output is a tidied up version but you should be able to run the commands)
cd /dev/shm # we do all of this in RAM!
dd if=/dev/urandom of=bigfile bs=1024 count=102400
$ lbzip2 -zk bigfile
Time: 0m3.596s
Size: 105335428
$ pbzip2 -zk bigfile
Time: 0m5.738s6
Size: 10532460
lbzip2
appears to be the winner on random data. It's slightly less compressed but much quicker. YMMV.
Correct answer by Oli on January 16, 2021
Relevant Arch Wiki entry: https://wiki.archlinux.org/index.php/Makepkg#Utilizing_multiple_cores_on_compression
# lzma compression
xz --threads=0
# drop-in parallel gzip replacement
# -p/--processes flag can be used to employ less cores
pigz
# drop-in parallel bzip2 replacement
# -p# flag can be used to employ less cores
# (note: no space between the -p and number of cores)
pbzip2
# modern zstd compression
# is used to build Arch packages by default
# since somewhere 2020
zstd --threads=0
Answered by murlakatamenka on January 16, 2021
It is not really an answer, but I think it is relevant enough to share my benchmarks comparing speed of gzip
and pigz
on a real HW in a real life scenario. As pigz
is the multithreaded evolution I personally have chosen to use from now on.
Metadata:
Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
(4c/8t) + Nvme SSDXubuntu 17.10 (artful)
gzip
version: 1.6
pigz
version: 2.4
gzip
quick
time gzip -1kN ./db_dump.sql
real 1m22,271s
user 1m17,738s
sys 0m3,330s
gzip
best
time gzip -9kN ./db_dump.sql
real 10m6,709s
user 10m2,710s
sys 0m3,828s
pigz
quick
time pigz -1kMN ./db_dump.sql
real 0m26,610s
user 1m55,389s
sys 0m6,175s
pigz
best (no zopfli
)
time pigz -9kMN ./db_dump.sql
real 1m54,383s
user 14m30,435s
sys 0m5,562s
pigz
+ zopfli
algorithm
time pigz -11kMN ./db_dump.sql
real 171m33,501s
user 1321m36,144s
sys 0m29,780s
As a bottomline I would not recommend the zopfli
algorithm since the compression took tremendous amount of time for a not-that-significant amount of disk space spared.
Resulting file sizes:
Answered by helvete on January 16, 2021
Zstandard supports multi-threading since v1.2.0¹. It is a very fast compressor and decompressor intended to replace gzip and it can also compress as efficient (if not better) as LZMA2/XZ on its highest levels.
You have to use one of these releases, or compile the latest version from source to get these benefits. Luckily it doesn't pull in a lot of dependencies.
There was also a 3rd party pzstd in v1.1.0 of zstd.
Answered by LiveWireBT on January 16, 2021
XZ Utils supports multi-threaded compression since v5.2.0, it was originally mistakenly documented as being multi-threaded decompression.
For example: tar -cf - source | xz --threads=0 > destination.tar.xz
Answered by donbradken on January 16, 2021
lzop may also be a viable option, although it's single-threaded.
It uses the very fast lempel-ziv-oberhumer compression algorithm which is 5-6 times faster than gzip in my observation.
Note: Although it's not multi-threaded yet, it will probably outperform pigz on 1-4 core systems. That's why I decided to post this even if it doesn't directly answer your question. Try it, it may solve your CPU bottleneck problem while using only one CPU and compressing a little worse. I found it often to be a better solution than, e.g pigz.
Answered by ce4 on January 16, 2021
The LZMA2 compressor of p7zip uses both cores on my system.
Answered by David Foerster on January 16, 2021
In addition the nice summary above (thanks Luis), these days folks might also want to consider PIXZ, which according to it's README (Source: https://github.com/vasi/pixz -- I haven't verified the claims myself) has some advantages over PXZ.
[Compared to PIXZ, PXZ has these advantages and disadvantages:]
* Simpler code
* Uses OpenMP instead of pthreads
* Uses streams instead of blocks, not indexable
* Uses temp files and doesn't combine them until the whole file is compressed, high disk/memory usage
In other words, PIXZ is supposedly more memory and disk efficient, and has an optional indexing feature that speeds up decompression of individual components of compressed tar files.
Answered by nturner on January 16, 2021
Well, the keyword was parallel. After looking for all compression tools that were also parallel I found the following:
PXZ - Parallel XZ is a compression utility that takes advantage of running LZMA compression of different parts of an input file on multiple cores and processors simultaneously. Its primary goal is to utilize all resources to speed up compression time with minimal possible influence on compression ratio.
sudo apt-get install pxz
PLZIP - Lzip is a lossless data compressor based on the LZMA algorithm, with very safe integrity checking and a user interface similar to the one of gzip or bzip2. Lzip decompresses almost as fast as gzip and compresses better than bzip2, which makes it well suited for software distribution and data archiving.
Plzip is a massively parallel (multi-threaded) version of lzip using the lzip file format; the files produced by plzip are fully compatible with lzip.
Plzip is intended for faster compression/decompression of big files on multiprocessor machines, which makes it specially well suited for distribution of big software files and large scale data archiving. On files big enough, plzip can use hundreds of processors.
sudo apt-get install plzip
PIGZ - pigz, which stands for Parallel Implementation of GZip, is a fully functional replacement for gzip that takes advantage of multiple processors and multiple cores when compressing data.
sudo apt-get install pigz
PBZIP2 - pbzip2 is a parallel implementation of the bzip2 block-sorting file compressor that uses pthreads and achieves near-linear speedup on SMP machines. The output of this version is fully compatible with bzip2 v1.0.2 (ie: anything compressed with pbzip2 can be decompressed with bzip2).
sudo apt-get install pbzip2
LRZIP - A multithreaded compression program that can achieve very high compression ratios and speed when used with large files. It uses the combined compression algorithms of zpaq and lzma for maximum compression, lzo for maximum speed, and the long range redundancy reduction of rzip. It is designed to scale with increases with RAM size, improving compression further. A choice of either size or speed optimizations allows for either better compression than even lzma can provide, or better speed than gzip, but with bzip2 sized compression levels.
sudo apt-get install lrzip
A small Compression Benchmark (Using the test Oli created):
ORIGINAL FILE SIZE - 100 MB
PBZIP2 - 101 MB (1% Bigger)
PXZ - 101 MB (1% Bigger)
PLZIP - 102 MB (1% Bigger)
LRZIP - 101 MB (1% Bigger)
PIGZ - 101 MB (1% Bigger)
A small Compression Benchmark (Using a Text file):
ORIGINAL FILE SIZE - 70 KB Text File
PBZIP2 - 16.1 KB (23%)
PXZ - 15.4 KB (22%)
PLZIP - 15.5 KB (22.1%)
LRZIP - 15.3 KB (21.8%)
PIGZ - 17.4 KB (24.8%)
Answered by Luis Alvarado on January 16, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP