Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
181 views
in Technique[技术] by (71.8m points)

c++ - How to decode huffman code quickly?

I have implementated a simple compressor using pure huffman code under Windows.But I do not know much about how to decode the compressed file quickly,my bad algorithm is:

Enumerate all the huffman code in the code table then compare it with the bits in the compressed file.It turns out horrible result:decompressing 3MB file would need 6 hours.

Could you provide a much more efficient algorithm?Should I use Hash or something?

Update: I have implementated the decoder with state table,based on my friend Lin's advice.I think this method should be better than travesal huffman tree,3MB within 6s.

thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

One way to optimise the binary-tree approach is to use a lookup table. You arrange the table so that you can look up a particular encoded bit-pattern directly, allowing for the maximum possible bit-width of any code.

Since most codes don't use the full maximum width, they are included at multiple locations in the table - one location for each combination of the unused bits. The table indicates how many bits to discard from the input as well as the decoded output.

If the longest code is too long, so the table is impractical, a compromise is to use a tree of smaller fixed-width-subscript lookups. For example, you can use a 256-item table to handle a byte. If the input code is more than 8 bits, the table entry indicates that decoding is incomplete and directs you to a table that handles the next up-to 8 bits. Larger tables trade memory for speed - 256 items is probably too small.

I believe this general approach is called "prefix tables", and is what BobMcGees quoted code is doing. A likely difference is that some compression algorithms require the prefix table to be updated during decompression - this is not needed for simple Huffman. IIRC, I first saw it in a book about bitmapped graphics file formats which included GIF, some time before the patent panic.

It should be easy to precalculate either a full lookup table, a hashtable equivalent, or a tree-of-small-tables from a binary tree model. The binary tree is still the key representation (mental model) of how the code works - this lookup table is just an optimised way to implement it.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...