May 15, 2005 6:28 PM
Arithmetic coding takes a stream of symbols and replaces it with a floating point number. It can be combined with statistical modeling to provide reasonable data compression.

For more details regarding Arithmetic Coding, visit Mark Nelson's website where his original article published in Dr. Dobbs Journal is still preserved intact.

I've written an arithmetic coding library in C++ that uses a statistical model. It's compression ratio (42% for a C source file) is currently way lower than that provided by gzip (a whopping 73%) but the version using a higher order (3) provides better performance (75.4%). gzip running with the best compression option (gzip -9) gives a ratio of 73.5% on the same file though it takes far lesser time to run. Here's the output of a sample session:

%./anicmpr sample.c
Compressing sample.c to test.out.
SimpleCompress Running 
Building Model (SimpleModel in compression mode)... 
Building Totals... Done.
Writing Model to File... Done.
Initializing Arithmetic Coder... Done. 
Coding........................... Done.
Input Bytes: 166909
Output Bytes: 96973
Compression Ratio: 41.9007

Uncompressing test.out to test1.out.
SimpleUncompress Running
Building Model (SimpleModel in decompression mode)... 
Finished creating scaledcounts...
Building Totals... Done.
Initializing Arithmetic Decoder.... Done.
Decoding.................................................. Done.
Input Bytes: 96973
Output Bytes: 166909
Compression Ratio: 41.9007

The problem with programming in C++ is that there is always this temptation to revert to good old C. But then, the advantages of C++ are obvious. I can plug in any model into the compressor during the creation of the object at runtime. Namespaces make it easy to group objects together and exceptions aid in improved error handling though I suspect it makes it harder to track program flow. Though a compression library might not be the best scenario for Object Oriented Programming, I got to learn C++ which was one of the objectives in doing this project besides writing a compression library that does not use Huffman Coding1. I had made rough sketches of class diagrams and sequence diagrams beforehand and they have proved invaluable during the project.

This post marks the anniversary of this website. Exactly one year ago, I posted the first entry, Assembly in Linux.

CategoryProgramming Comment(s)


[1] Since almost everyone has done that at some point of time.

Copyright © 2004-2011 Anirudh Sasikumar. All rights reserved.
Last Updated: May 15, 2005 8:50 PM