Coding Horror

programming and human factors

Primary Keys: IDs versus GUIDs

Long-time readers of this blog know that I have an inordinate fondness for GUIDs. Each globally unique ID is like a beautiful snowflake: every one a unique item waiting to be born.

Perhaps that's why I read with great interest recent accounts of people switching their database tables from traditional integer primary keys ...

ID  Value
--  -----
1  Apple
2  Orange
3  Pear
4  Mango

.. to GUID keys.

ID                                    Value
------------------------------------  -----
C87FC84A-EE47-47EE-842C-29E969AC5131  Apple
2A734AE4-E0EF-4D77-9F84-51A8365AC5A0  Orange
70E2E8DE-500E-4630-B3CB-166131D35C21  Pear
15ED815C-921C-4011-8667-7158982951EA  Mango

I know what you're thinking. Using sixteen bytes instead of four bytes for a primary key? Have you lost your mind? Those additional 12 bytes do come at a cost. But that cost may not be as great as you think:

Using a GUID as a row identity value feels more natural-- and certainly more truly unique-- than a 32-bit integer. Database guru Joe Celko seems to agree. GUID primary keys are a natural fit for many development scenarios, such as replication, or when you need to generate primary keys outside the database. But it's still a question of balancing the tradeoffs between traditional 4-byte integer IDs and 16-byte GUIDs:

GUID Pros

  • Unique across every table, every database, every server
  • Allows easy merging of records from different databases
  • Allows easy distribution of databases across multiple servers
  • You can generate IDs anywhere, instead of having to roundtrip to the database
  • Most replication scenarios require GUID columns anyway
GUID Cons

  • It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
  • Cumbersome to debug (where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
  • The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes

I'm not proposing that every database switch to GUID primary keys, but I do think it's important to know the option is out there. If you're still on the fence, what should I choose for my primary key? has excellent advice and a solid analysis of the tradeoffs.

Written by Jeff Atwood

Indoor enthusiast. Co-founder of Stack Exchange and Discourse. Disclaimer: I have no idea what I'm talking about. Find me here: http://twitter.com/codinghorror