NOTE: This is not meant to be taken seriously for real-world applications (see the comments section).
I decided to compare different compression programs, so I downloaded the source code of linux-2.6.25 and tar-ed the directory into linux-2.6.25.tar. Then I ran 7-zip, bzip2, gzip, lpaq8, paq8o6, rzip, and sr3a on the tar file. I used the time command to time these programs (even if they have a timer themselves).
original file
The original filesize of the tar was: 284651520 bytes (271.5 MiB)
gzip
gzip -9c linux-2.6.25.tar > linux-2.6.25.tar.gz
Filesize: 61524085 bytes (58.7 MiB) (21.61%)
Time: 31.334s
gzip sucks. Please don’t use it.
sr3a
sr3a c linux-2.6.25.tar linux-2.6.25.tar.sr3
SR3 file compressor (C) 2007, Matt Mahoney Licensed under GPL, http://www.gnu.org/copyleft/gpl.html Modified by Nania Francesco Antonio (Italy) linux-2.6.25.tar: 284651520 -> 51872928 in 30.53 sec.
Filesize: 51872928 bytes (49.5 MiB) (18.22%)
Time: 30.094s
Wow, faster than gzip by 1 second and compresses better as well.
bzip2
bzip2 -9k linux-2.6.25.tar
Filesize: 48564607 bytes (46.3 MiB) (17.06%)
Time: 55.911s
Average compressor.
rzip
rzip -9kvvv linux-2.6.25.tar
hashsize = 8388608. bits = 23. 64MB Starting sweep for mask 1 Starting sweep for mask 3 Starting sweep for mask 7 Starting sweep for mask 15 Starting sweep for mask 31 5592404 total hashes 225878 in primary bucket (4.039%) matches=999642 match_bytes=104996777 literals=902262 literal_bytes=179654743 true_tag_positives=33727524 false_tag_positives=48478942 inserts=19225657 match 0.584 linux-2.6.25.tar - compression ratio 6.071
Filesize: 46890792 bytes (44.7 MiB) (16.47%)
Time: 58.752s
Don’t use bzip2. Use rzip. It compresses better than bzip2 (and is slightly slower).
sbc
sbc c -m3 -b63 linux-2.6.25.tar.sbc linux-2.6.25.tar
-------------------------------------------------------------------------------
<>
-------------------------------------------------------------------------------
Creating archive: "linux-2.6.25.tar.sbc"...
Searching files...
Archive encryption: none
Sorting files...Done (time: 0.0 seconds)...
Compressing, method: advanced, blocks: 32.0 MB/analysis+name, mem.: 226.1 MB.
Compressing...
linux-2.6.25.tar [blk 0000, 0.00 bpB, 0.0%]
linux-2.6.25.tar [blk 0001, 1.74 bpB, 1.6%]
linux-2.6.25.tar [blk 0002, 1.76 bpB, 1.7%]
linux-2.6.25.tar [blk 0003, 1.78 bpB, 1.8%]
linux-2.6.25.tar [blk 0004, 1.78 bpB, 3.6%]
linux-2.6.25.tar [blk 0005, 1.18 bpB, 15.4%]
linux-2.6.25.tar [blk 0006, 1.19 bpB, 27.2%]
linux-2.6.25.tar [blk 0007, 1.18 bpB, 39.0%]
linux-2.6.25.tar [blk 0008, 1.20 bpB, 50.7%]
linux-2.6.25.tar [blk 0009, 1.20 bpB, 62.5%]
linux-2.6.25.tar [blk 0010, 1.20 bpB, 63.1%]
linux-2.6.25.tar [blk 0011, 1.19 bpB, 74.9%]
linux-2.6.25.tar [blk 0012, 1.16 bpB, 86.7%]
linux-2.6.25.tar [blk 0013, 1.17 bpB, 94.5%]
linux-2.6.25.tar [blk 0014, 1.16 bpB, 99.4%]
linux-2.6.25.tar [blk 0015, 1.16 bpB, 100.0%]
* Successfully compressed 284,651,520 into 41,322,279 (14.5%) bytes.
* Compressor: 1.161 bpB, 1571.84 kB/s, 176.85 seconds.
Filesize: 41322279 bytes (39.4 MiB) (14.52%)
Time: 2m56.239s
Pretty good result, similar to 7-zip’s.
7za (7-zip)
7za a -mx=9 linux-2.6.25.tar.7z linux-2.6.25.tar
7-Zip (A) 4.57 Copyright (c) 1999-2007 Igor Pavlov 2007-12-06 p7zip Version 4.57 (locale=en_AU.UTF-8,Utf16=on,HugeFiles=on,4 CPUs) Scanning Creating archive linux-2.6.25.tar.7z Compressing linux-2.6.25.tar Everything is Ok
Filesize: 39999865 bytes (38.1 MiB) (14.05%)
Time: 2m55.804s
Best general purpose compressor with a very good compression ratio.
lpaq8
lpaq8 7 linux-2.6.25.tar linux-2.6.25.tar.lpaq8
284651520 -> 29938803 in 947.320 sec. using 390 MB memory
Filesize: 29938803 bytes (28.6 MiB) (10.52%)
Time: 15m45.911s
Compresses a good 10 MB better than 7-Zip. But takes around 5 times longer.
paq8o6_sse (paq8o6)
paq8o6_sse -7 linux-2.6.25.tar
Creating archive linux-2.6.25.tar.paq8o6 with 1 file(s)... linux-2.6.25.tar 284651520 -> PGM 225x289 25253916 284651520 -> 25253916 Time 500.56 sec, used 873327891 bytes of memory
Filesize: 25253916 bytes (24.1 MiB) (8.87%)
Time: 365m57.112s
Holy s***. 8.87%. Compresses 4 MB better than lpaq8, but takes 23 times longer (6 whole hours).
Conclusion
Use 7-zip. If you want faster compression, use rzip. And if you’ve got a supercomputer, use lpaq8. If you’ve got 23 supercomputers and can hack paq to use SMP, use paq. Please don’t use gzip or bzip2.

April 27, 2008 at 6:00 pm |
It still makes a lot of sense to use gzip and bzip2 because they are widely accepted standards and well established programs, debugged, tested and supported by tar, rpm, dpkg, etc. Telling people “please don’t use” is just silly.
To quote from rzip help “note that rzip cannot operate on stdin/stdout” which means that it cannot do the jobs that bzip2 does. Again, telling people not to use bzip2 is just plain dumb when your suggested replacement is lacking fundamental features.
Also, you tested one single file as if this was the last word in compression testing. Other files will give different results. bzip2 will always beat rzip in both speed and compression ratio when compressing PGM image data if the number of colours is low (e.g. scanned documents). Whereas rzip gives better compression for source code because files often have common headers.
rzip sometimes compresses worse at “-9″ than it does at “-6″ as well as being slower, thus to compress to best effect you need to wrap it in a script to test which option gives the best result (as above, scanned PGM documents with reduced colours will demonstrate this).
You also completely ignore the time to decompress. gzip has a very small and simple decompressor which makes it great for bootstraps and embedded systems. rzip and bzip2 give approximately equal compression performance on executables, but while bzip2 is slower to compress, rzip is slower to decompress. An executable might get compressed once, but decompressed many times over so the speed of decompression is a bit more important. 7-zip beats nearly everything in the combination of good compression ratio and fast decompression but 7-zip is difficult to script with because it works a bit like tar (but missing the full features of tar) and a bit like a compressor.
April 27, 2008 at 9:00 pm |
What I meant was for personal use (backups, etc). And yes, everything you say is true. However, I did not mean for this to be “the last word in compression testing”. This was only meant to be a small test of a few compressors, and the real point in doing this was to compare PAQ to the other compressors (PAQ compresses way too slowly for any real work). I was trying to emphasize how slow PAQ is. I threw in the other compressors at the end.
I also do not have the time to do rigorous testing. Again, this is not a “serious” test.
January 1, 2009 at 9:31 pm |
Interesting…I didn’t know even that some of these options existed really I will take into account some of the interesting parts of what was said. Even a really dog slow method might just save my bacon at some point.
Kind of makes me thing even with external storage becoming so cheap.
Maybe if you could include a comparison rar