Huffman

This is a Haskell module to do Huffman coding. It is intended to be fairly generic and a good base for experimentation.

Included are file compression and decompression utilities that exercises the codebase. Performance is slower than but comparable to bzip2. On my laptop (Pentium M 1.6GHz with 1GB RAM) bzip2 takes 0.9 seconds to compress the text of Moby Dick, producing a 382KB file from a 1243KB original. My hencode utility takes 3.8 seconds to compress this file, producing a 713KB compressed file (including a 0.5KB decompression table). Decompression with hdecode takes 2.4 seconds, vs 0.4 seconds for bzip2.

[I fixed a bad bug in the tree building—I'd been appending to the wrong end of a queue. If you have the old version, you'll want to re-download.]

The plan for this program was to enhance the Huffman code module such that it can be used as a basis for an implementation of inflate/deflate compression (RFC 1951), which is essentially Huffman coding plus LZ77 compression. This, in turn, can easily be wrapped into a ZLIB implementation (RFC1950), and will facilitate an implementation of the PNG image compression standard (RFC 2083 describes an earlier version).

However, it turned out that somebody has already implemented inflate/deflate in pure Haskell on hackage, which was cool. Work on the PNG implementation is on hold, but still intended to complete.

A good practical reference for Huffman coding tricks is Schindler's webpage. Adaptive Huffman coding also looks interesting, and I may tackle it at some point.