Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A simple example of Huffman coding on a string (nerdaholyc.com)
45 points by romil on May 20, 2012 | hide | past | favorite | 5 comments


It's nice when things are explained well for novices, and I think this article does it very well. However, I found this one to be a bit incomplete near the end:

The input string : beep boop beer!

The input string in binary : 0110 0010 0110 0101 0110 0101 0111 0000 0010 0000 0110 0010 0110 1111 0110 1111 0111 0000 0010 0000 0110 0010 0110 0101 0110 0101 0111 0010 0010 0001

The encoded string : 0011 1110 1011 0001 0010 1010 1100 1111 1000 1001

This makes the encoding look much more effective on small strings than it really is. Namely, just like you need an ASCII table to convert the first byte (0110 0010) from the input string to a "b" (but virtually every computer has an ASCII table - or a font - somewhere!), you need the huffman table, stored in some efficient way, to be able to interpret the compressed string. Storing (and reading) a huffman table size-efficiently isn't perfectly trivial either, and no matter what it would blow up the "compressed" string to be bigger than the original.

In other words, IMHO this article only explained half of what Huffman coding is about.


Also worth studying is adaptive Huffman coding. See:

http://en.wikipedia.org/wiki/Adaptive_Huffman_coding

Using adaptive coding, you either begin with no tree or you begin with a tree in a known state and change the tree with each character processed. This allows for one-pass encoding of a set of data.


If anyone is interested in a basic Python implementation of Huffman coding for pedagogical purposes, I put one up on GitHub: https://github.com/ajtulloch/Utilities/blob/master/HuffmanCo....


hi from Bucharest :)


In the next installment of this series we will show you how to format your output with printf.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: