Conservation of Bits

Copyright ©2003 by Paul Niquette. All rights reserved.

With the astounding acceleration of speeds in computing and communications year by year, with the vanishing costs of memory in all forms, why would anybody be interested in the Conservation of Bits?
Well, it seems there is no such thing as too fast or too economical.  Besides, Conservation of Bits, better known as "data compression," is itself responsible for much of the progress in digital technologies.
Indeed, every day we enjoy benefits of super speed and small space afforded by compressed data files in the form of...
.zip documents attached to our e-mail messages,
.gif images decorating our screens, and
.mp3 music coming through our speakers.
For the Conservation of Bits, each particular data compression algorithm exploits the statistical properties of...
  • typography in text (ETAOIN SHRDLU),
  • pixels in pictures, and
  • periodicities in sound waves.
Here's what data compression does.  It assigns a variable number of bits to each "symbol" to be transmitted -- shorter codes for the more frequent symbols and longer codes for less frequent symbols.  Makes sense, doesn't it.

Take, for example, the symbols T, A, C, and G representing the four nucleotides in DNA.  The Martian Microbe puzzle applied a simple, fixed length coding scheme for transmitting long sequences of those symbols, which is depicted here in the form of a coding tree.
 
 

That 'dibit' code is reprised in the table below alongside the frequencies of nucleotides observed in life on earth.

Nucleotide
Dibit Code
Frequencies
Guanine
Cytosine
Adonine
Thymine
11
10
01
00
35.7%
28.6%
21.4%
14.3%

 

Can you think of a variable-length code that will conserve bits?

GO TO SOLUTION PAGE