By Charles Miller

In a recent column I used a word that failed to pass a spell-check and so one of my proofreaders called this to my attention. The word was “petabyte” and it came as a surprise to me that the word was not in the word processor’s dictionary. Apparently at some point in the past I had added the word to my word processor’s custom dictionary but the proofreader had not yet done so. There are many English language words having etymological pedigree that still do not appear in the dictionary.

The background on the word petabyte is that we use prefix multipliers to define the sizes of the hard disks used in our computers. In physics, kilo, mega, giga, tera, peta, exa, zetta, and yotta are the power-of-two multiple prefixes that are used to denote the quantity of something. When combined with bit, byte, gram, we form words such as kilobit, gigabyte, kilometer, etc.

A kilobyte is a thousand bytes, technically 1,024 bytes, but I would prefer not to get that technical for this explanation. A megabyte is a thousand kilobytes. A gigabyte is a million kilobytes. A terabyte is a billion kilobytes, and a petabyte is a trillion kilobytes. I could go on, but it might be more understandable to look at some real-world examples, all of which are estimates:

A typewritten page is usually about two kilobytes of data. A short novel is one megabyte. The complete works of Shakespeare are only five megabytes. A pickup truck filled with books is one gigabyte. If 50,000 trees are felled, made into paper and printed, that would total only one terabyte. Most academic research libraries are two terabytes. The entire print collections of the U.S. Library of Congress are estimated at 10 terabytes.

The printed word requires little storage space compared to audio/video media. A very low-resolution photograph might be 100 kilobytes while a high-resolution image several megabytes. One music album is about 500 megabytes or half a gigabyte. A collection of all the works of Mozart and Beethoven is 30 gigabytes. A collection of a thousand feature-length movies is only a few terabytes.

Beyond this point is where we really get into some wildly unverifiable estimates. Since 2000, researchers at the University of California, Berkeley, have continued trying to estimate each year how much information exists. They guess that all of the academic research libraries in the United States amount to 2 petabytes of data. All of the printed material in the world totals 200 petabytes. A petabyte is a thousand terabytes.

An exabyte is a thousand petabytes. It has been estimated that if all words ever spoken by human beings were transcribed, that would be five exabytes, or what the NSA is rumored to have in their Utah data center. It seems I have now run out of space for examples of zettabyte and yottabyte, so let us just agree that those two are huge almost beyond imagination.

