Coding Horror

programming and human factors

Gigabyte: Decimal vs. Binary

Everyone who has ever purchased a hard drive finds out the hard way that there are two ways to define a gigabyte.  

When you buy a "500 Gigabyte" hard drive, the vendor defines it using the decimal powers of ten definition of the "Giga" prefix.

500 * 109 bytes = 500,000,000,000 = 500 Gigabytes

But the operating system determines the size of the drive using the computer's binary powers of two definition of the "Giga" prefix:

465 * 230 bytes = 499,289,948,160 = 465 Gigabytes

If you're wondering where 35 Gigabytes of your 500 Gigabyte drive just disappeared to, you're not alone. It's an old trick perpetuated by hard drive makers-- they intentionally use the official SI definitions of the Giga prefix so they can inflate the the sizes of their hard drives, at least on paper. This was always an annoyance, but now it's much more difficult to ignore, as it results in large discrepancies with today's enormous hard drives. When is a Terabyte hard drive not a Terabyte? When it's 931 GB.

As Ned Batchelder notes, the hard drive manufacturers are technically conforming to the letter of the SI prefix definitions. It's us computer science types who are abusing the official prefix designations:

Year Approved Official Definition Informal Meaning Difference Prefix Derived From
giga GB 1960 109230 7% Greek root for giant
tera TB 1960 1012 240 10% Greek root for monster
peta PB 1975 1015 250 13% Greek root for five, "penta"
exa EB 1975 1018 260 15% Greek root for six, "hexa"
zetta ZB 1991 1021 270 18% Latin root for seven, "septum", p dropped, first letter changed to S to avoid confusion with other SI symbols
yotta YB 1991 1024 280 21% Greek root for eight, "octo", c dropped, y added to avoid having symbol of zero-like letter O

As the size of the prefix grows, so does the gap between the official and informal meaning of the prefix. And yes, there are larger official SI prefixes beyond these, just in case someone needs more than 1000 yottabytes. Ned noted that one of the SI proposals is for the prefix "luma", representing 1063.

Speaking of impossibly large numbers, if you're like most people reading this article, then you probably arrived here through Google. Google is a tragically but forever misspelled version of Googol:

A googol is 10100, i.e. a 1 followed by 100 zeros. In official SI prefix terms, a googol is approximately a yotta squared, squared. Even larger is the googolplex, which is equal to 10 to the power of a googol (10googol); this number is about the same size as the number of possible games of chess. Even larger numbers have been defined, such as Skewes' number, Graham's number, and the Moser, which I won't even try to describe.

But I digress. When we use gigabyte to mean 230, that's an inaccurate and informal usage. Instead, we're supposed to be using the more accurate and disambiguated IEC prefixes. They were introduced in 1998 and formalized with IEEE 1541 in 2000.

kibibyte KiB 210
mebibyte MiB 220
gibibyte GiB 230
tebibyte TiB 240
pebibyte PiB 250
exbibyte EiB 260
zebibyte ZiB 270
yobibyte YiB 280

You occasionally see these more correct prefixes used in software, but adoption has been slow at best. There are several problems:

  1. They sound ridiculous. I hear the metric system used more often in the United States than I hear the words "kibibyte" or "mebibyte" uttered by anyone with a straight face. Which is to say, never.

  2. Hard drive manufacturers won't use them. Drive manufacturers don't care about being correct. What they do care about is consumers buying their drives because they have the largest possible number plastered on the front of the box. If a big lawsuit wasn't enough to get them to mend their ways, I seriously doubt that the recommendation of an international standards body is going to sway them.

  3. Tradition rules. It's hard to give up on the rich binary history of kilobytes, megabytes, and gigabytes, particularly when the alternatives are so questionable.

It's good to keep in mind the discrepancy between the decimal and binary meanings of the SI prefixes. The difference can bite you if you're not careful. But I think we're stuck with contextual, dual-use meanings of the SI prefixes for the forseeable future. Or perhaps we're all overthinking this, as Alan Green notes:

Whenever I try to discuss [this] with my friends, they say, "Yotta getta life".

Written by Jeff Atwood

Indoor enthusiast. Co-founder of Stack Exchange and Discourse. Disclaimer: I have no idea what I'm talking about. Find me here: http://twitter.com/codinghorror