Mastering GUIDs with Occam's Razor
Do you remember the scene from the movie Full Metal Jacket where the marines recite the USMC creed?
It's a little known fact, but programmers have a similar creed:
This is my GUID. There are many like it but this one is mine. My GUID is my best friend. It is my life. I must master it as I must master my life. Without me, my GUID is useless. Without my GUID I am useless.
In fact, GUIDs are so near and dear to our hearts that we recently had a spirited discussion about them at work. Let's say you had a string and needed to determine whether it was a valid GUID. The easy way is a .Parse() style Try-Catch code block:
guid g; try { g = new Guid("x"); } catch { }
This is the correct answer.. most of the time. But you know programmers. They never met an edge condition they didn't enjoy discussing ad nauseam. And I was one of the first to chime in:
This is definitely a good way to validate a data type, however, just be aware of the exception performance penalty. Throwing exceptions on failure to cast is expensive, so if this is something that
- will be invalid often
- appears in a loop
- occurs with high frequency
then you'd want to go with a non-exception based check. However most of the time none of these things are true, so the performance is irrelevant.
Then someone suggested trying a regular expression. Oh great, now we have two problems:
Regex r = new Regex( "^((?-i:0x)?[A-Fa-f0-9]{32}| [A-Fa-f0-9]{8}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{12}| {[A-Fa-f0-9]{8}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{12}})$");
It's valid, but I couldn't resist tweaking this regex for simplicity's sake. The official GUID spec only defines one format for GUID strings, the familiar 8-4-4-4-12 format:
Regex r = new Regex( @"^({|()?[A-Fa-f0-9]{8}-([A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}(}|))?$");
This is my post, so I'll skip the part where others poked holes in my regex. Just when we thought it was over, a fellow developer whipped out a code snippet that benchmarks how long it takes to validate GUIDs via each method:
static void Main(string[] args) { Guid g = Guid.NewGuid(); string s = g.ToString(); DateTime before = DateTime.Now; for (int i = 0; i < 10000; i++) { bool retVal = IsGuid(s); } Console.WriteLine(DateTime.Now.Subtract(before)); before = DateTime.Now; for (int i = 0; i < 10000; i++) { bool retVal = IsGuid2(s); } Console.WriteLine(DateTime.Now.Subtract(before)); Console.ReadLine(); } public static bool IsGuid(string guidString) { try { Guid guid = new Guid(guidString); return true; } catch { return false; } } public static bool IsGuid2(string guidString) { Regex r; r = new Regex( @"^({|()?[A-Fa-f0-9]{8}-([A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}(}|))?$"); Match m = r.Match(guidString); if (m.Success) return true; else return false; }
According to this, constructor validation is 3 to 4 times faster than the regex.. or is it? I immediately noticed a few problems that made this a rather questionable benchmark. And, as before, I couldn't resist investigating:
If I increase the iterations to 100,000:00.1874856You typically wouldn't want to create a new regex inside the loop, because it's too expensive. If I move the regex creation outside the loop:
00.796813800.2031094If I set RegexOptions.Compiled on the regex:
00.578080600.1874856If I run the above with CTRL+F5 (sans debugger):
00.343723600.1718673
00.1874916
It was definitely a fun discussion. I certainly learned a few things about GUIDs I didn't know. Heck, discussions like this are why I joined a software development company in the first place. But it's also a pointless discussion.
Performance was a complete non-issue in this particular scenario. That's why we should always program with Occam's Razor in mind:
Given two similar code paths, choose the simpler one.
Edge conditions and fancy techniques are interesting, but they're not necessarily a worthwhile use of time. Sometimes the simple and stupid solution is all you need.