And I thought my implementation of Deflate was bad

¶And I thought my implementation of Deflate was bad

When I added PNG support to 1.7.0, I wrote my own routines to handle the Deflate compression algorithm. The Deflate algorithm is also the underlying compression algorithm behind the well-known 'zip' format, and is surprisingly complex, with a sliding window matching algorithm combined with three layers of Huffman trees. It was quite an educational experience to write a fast sliding window compressor.

Someone posted an interesting bug in the Visual Studio public bug database about the new zip file support enlarging files, so I decided to try .NET Framework's Deflate compressor:

using System;
using System.IO;
using System.IO.Compression;
namespace VCZipTest {
    class Program {
        static void Main(string[] args) {
            using (FileStream src = new FileStream(args[0], FileMode.Open)) {
                using (FileStream dst = new FileStream(args[1], FileMode.Create)) {
                    using (GZipStream def = new GZipStream(dst, CompressionMode.Compress, true)) {
                        byte[] buf = new byte[4096];
                        for (; ; ) {
                            int act = src.Read(buf, 0, 4096);
                            if (act <= 0)
                                break;
                            def.Write(buf, 0, act);
                        }
                    }
                    System.Console.WriteLine("{0} ({1} bytes) -> {2} ({3} bytes)", args[0],
                           src.Length, args[1], dst.Length);
                }
            }
        }
    }
}

D:projwinVCZipTestbinDebug>vcziptest f:shuffle.mp3 f:test.bin.gz
f:shuffle.mp3 (1439439 bytes) -> f:test.bin.gz (2114778 bytes)

Yes, the .NET Framework implementation actually enlarged the file by 47%, and yes, gunzip was able to "decompress" the 2.1MB file correctly. I ran the file through VirtualDub's zip decompressor and found that the literal/length Huffman tree in the stream was pretty badly misbalanced, with most of the literal codes allocated 14 bits. Deflate should never enlarge a stream by more than a tiny fraction since it allows individual blocks to be stored.

In contrast, even "gzip -1" was able to compress the file from 1,439,439 bytes to 1,416,488 bytes, and gzip -9 got it down to 1,406,973 bytes.

13 comments | Dec 02, 2006 at 22:15 | default

Current version

Navigation

Archives

¶And I thought my implementation of Deflate was bad

Comments