    Archive for March 28th, 2015

    Compressors galore: pbzip2, lbzip2, plzip, xz, and lrzip tested on a FASTQ file

    28th March 2015

    About 2 years ago I had already reviewed some parallel (and not) compressing utilities, settling at that time on pbzip2 – it scales quasi-linearly with the number of CPUs/cores, stores compressed data in relatively small 900k blocks, is fast, and has good compression ratio. pbzip2 was (and still is) a very good choice.

    Yesterday I got somewhat distracted, and thus found lbzip2 -

    an independent, multi-threaded implementation of bzip2. It is commonly the fastest SMP (and uniprocessor) bzip2 compressor and decompressor

    - as it says in the Debian package description. Is it really “commonly the fastest” one? How does it compare to pbzip2? Should I use lbzip2 instead of pbzip2?

    This minor distraction had grown into a full-scale web-search and comparison, adding to the mix plzip (a parallel version of lzip), xz, and lrzip. After reading thousands of characters, all of these were put to a simple test: compressing an about 2 gigabyte FASTQ file with default options.

    All the external links and benchmarks, as well as my own mini-benchmark results, are provided below.

    The conclusion is that out of all the tested compressors lbzip2 is indeed the best one (for my practical use). It is only slightly better than the trusty pbzip2, which takes the second place. All the other compressors performed so poorly, that they do not get any place in my practical rating…

    So, let us first ask internet wisdom/foolishness, if lbzip2 or pbzip2 is faster/better?
