There are a ton of compression algorithms out there. What you need here is a lossless compression algorithm. A lossless compression algorithm compresses data such that it can be decompressed to achieve exactly what was given before compression. The opposite would be a lossy compression algorithm. Lossy compression can remove data from a file. PNG images use lossless compression while JPEG images can and often do use lossy compression.
Some of the most widely known compression algorithms include:
ZIP archives use a combination of Huffman coding and LZ77 to give fast compression and decompression times and reasonably good compression ratios.
LZ77 is pretty much a generalized form of RLE and it will often yield much better results.
Huffman allows the most repeating bytes to represent the least number of bits.
Imagine a text file that looked like this:
aaaaaaaabbbbbcccdd
A typical implementation of Huffman would result in the following map:
Bits Character
0 a
10 b
110 c
1110 d
So the file would be compressed to this:
00000000 10101010 10110110 11011101 11000000
^^^^^
Padding bits required
18 bytes go down to 5. Of course, the table must be included in the file. This algorithm works better with more data :P
Alex Allain has a nice article on the Huffman Compression Algorithm in case the Wiki doesn't suffice.
Feel free to ask for more information. This topic is pretty darn wide.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…