LZ4 vs. GZIP

Compression becomes more important as we continue to add content on the web every single day. The average size of our websites continues to increase rapidly, so we need to look around for ways to minimize the waiting time for our users. The nature of our content determines the most appropriate algorithm for its compression. For example, if we have images, PNG or JPEG will work, but to choose between them we need to know the number of colors our image has and whether a small loss in detail is tolerable. WebP is another option, but we need to ensure that it is well supported and just as easy to use.

First, I decided to compare both algorithms with something that is familiar and in wide use on the web today: jQuery. So I took the most recent development and production versions and supplied them as input to both algorithms. I did GZIP compression through the gzip package in Python, which uses a compression level 9 (best compression, slowest speed) by default, so I needed to make sure that LZ4 used the same setting. I used the command-line tool for Windows that Yann Collet—the creator of the LZ4 algorithm—provides.

Compression of the jQuery production version

With the uncompressed, development version, only small differences were noticeable, but GZIP offered a better compression.

Compression of the jQuery development version

With the minified, production version, differences were somewhat more pronounced, with GZIP again being better.

Compression of a big file in natural language

We see again small differences, so in a case like this one, where the original file is big, the speeds of compression/decompression will be much more important. If the LZ4 decompression is much faster than the GZIP decompression, users will be able to see the content much sooner, which will make them believe that LZ4 performs better.

Where else do we have a lot of content? Databases. So I took a simple SQL file to see whether there is any difference this time.

Compression of an SQL file

It seems that in this case both algorithms perform a bit better, which made the compressed versions almost 25% of the original size. But a bigger original file could have made the test a bit more representative here.

Compression of a large file with permutations of a set of characters

Finally, I decided to give as input my own file that contained all possible permutations of a set of characters, separated through a delimiter. We can see that in this case differences in the output size are no longer small. Something else that I noticed is that on my single core processor, GZIP compressed this file in 21 seconds, whereas LZ4 finished in 66 seconds. On a multi-core system LZ4 might have performed much better. Decompression on the other side was different: GZIP took around 4 seconds and LZ4 finished in less than a second, which is very fast for a file size of 112MB. Applications that have to deal with very large datasets could certainly benefit from this. So the decision of which algorithm to use could really depend on connection speed. If users have fast Internet connections, they are less likely to notice that a large file took a couple of seconds more to download. Then they could also decompress it instantly with LZ4 (if browsers supported this). But if they are on a slower connection, the time required to download a large file could quickly overshadow the gains from the faster decompression. In this case GZIP would still be a more appropriate choice.

bit.ly/1g6K86Y