Gigabyte: Decimal vs. Binary
Everyone who has ever purchased a hard drive finds out the hard way that there are two ways to define a gigabyte.
When you buy a "500 Gigabyte" hard drive, the vendor defines it using the decimal powers of ten definition of the "Giga" prefix.
500 * 109 bytes = 500,000,000,000 = 500 Gigabytes
But the operating system determines the size of the drive using the computer's binary powers of two definition of the "Giga" prefix:
465 * 230 bytes = 499,289,948,160 = 465 Gigabytes
If you're wondering where 35 Gigabytes of your 500 Gigabyte drive just disappeared to, you're not alone. It's an old trick perpetuated by hard drive makers-- they intentionally use the official SI definitions of the Giga prefix so they can inflate the the sizes of their hard drives, at least on paper. This was always an annoyance, but now it's much more difficult to ignore, as it results in large discrepancies with today's enormous hard drives. When is a Terabyte hard drive not a Terabyte? When it's 931 GB.
As Ned Batchelder notes, the hard drive manufacturers are technically conforming to the letter of the SI prefix definitions. It's us computer science types who are abusing the official prefix designations:
Year Approved | Official Definition | Informal Meaning | Difference | Prefix Derived From | ||
giga | GB | 1960 | 109 | 230 | 7% | Greek root for giant |
tera | TB | 1960 | 1012 | 240 | 10% | Greek root for monster |
peta | PB | 1975 | 1015 | 250 | 13% | Greek root for five, "penta" |
exa | EB | 1975 | 1018 | 260 | 15% | Greek root for six, "hexa" |
zetta | ZB | 1991 | 1021 | 270 | 18% | Latin root for seven, "septum", p dropped, first letter changed to S to avoid confusion with other SI symbols |
yotta | YB | 1991 | 1024 | 280 | 21% | Greek root for eight, "octo", c dropped, y added to avoid having symbol of zero-like letter O |
As the size of the prefix grows, so does the gap between the official and informal meaning of the prefix. And yes, there are larger official SI prefixes beyond these, just in case someone needs more than 1000 yottabytes. Ned noted that one of the SI proposals is for the prefix "luma", representing 1063.
Speaking of impossibly large numbers, if you're like most people reading this article, then you probably arrived here through Google. Google is a tragically but forever misspelled version of Googol:
A googol is 10100, i.e. a 1 followed by 100 zeros. In official SI prefix terms, a googol is approximately a yotta squared, squared. Even larger is the googolplex, which is equal to 10 to the power of a googol (10googol); this number is about the same size as the number of possible games of chess. Even larger numbers have been defined, such as Skewes' number, Graham's number, and the Moser, which I won't even try to describe.
But I digress. When we use gigabyte to mean 230, that's an inaccurate and informal usage. Instead, we're supposed to be using the more accurate and disambiguated IEC prefixes. They were introduced in 1998 and formalized with IEEE 1541 in 2000.
kibibyte | KiB | 210 |
mebibyte | MiB | 220 |
gibibyte | GiB | 230 |
tebibyte | TiB | 240 |
pebibyte | PiB | 250 |
exbibyte | EiB | 260 |
zebibyte | ZiB | 270 |
yobibyte | YiB | 280 |
You occasionally see these more correct prefixes used in software, but adoption has been slow at best. There are several problems:
- They sound ridiculous. I hear the metric system used more often in the United
States than I hear the words "kibibyte" or "mebibyte" uttered by anyone with a straight face. Which is to say, never.
- Hard drive manufacturers won't use them. Drive manufacturers don't
care about being correct. What they do care about is consumers buying their drives
because they have the largest possible number plastered on the front of the box.
If a big lawsuit wasn't enough to get them to mend their ways, I seriously doubt
that the recommendation of an international standards body is going to sway them.
- Tradition rules. It's hard to give up on the rich binary history of kilobytes, megabytes, and gigabytes, particularly when the alternatives are so questionable.
It's good to keep in mind the discrepancy between the decimal and binary meanings of the SI prefixes. The difference can bite you if you're not careful. But I think we're stuck with contextual, dual-use meanings of the SI prefixes for the forseeable future. Or perhaps we're all overthinking this, as Alan Green notes:
Whenever I try to discuss [this] with my friends, they say, "Yotta getta life".