To Compile or Not To Compile

I am currently in the middle of a way-overdue refactoring of MhtBuilder, which uses regular expressions extensively. I noticed that I had sort of mindlessly added RegexOptions.Compiled all over the place. It says “compiled” so it must be faster, right? Well, like so many other things, that depends:

In [the case of RegexOptions.Compiled], we first do the work to parse into opcodes. Then we also do more work to turn those opcodes into actual IL using Reflection.Emit. As you can imagine, this mode trades increased startup time for quicker runtime: in practice, compilation takes about an order of magnitude longer to startup, but yields 30% better runtime performance. There are even more costs for compilation that should mentioned, however. Emitting IL with Reflection.Emit loads a lot of code and uses a lot of memory, and that’s not memory that you’ll ever get back. In addition. in v1.0 and v1.1, we couldn’t ever free the IL we generated, meaning you leaked memory by using this mode. We’ve fixed that problem in Whidbey. But the bottom line is that you should only use this mode for a finite set of expressions which you know will be used repeatedly.

In other words, this is something you don’t want to do casually, as I was. And 30% faster isn’t a very compelling performance gain to balance against those serious tradeoffs. Unless you’re in a giant loop, or processing humongous strings, it’s almost never worth it. The MSDN documentation also has this interesting tidbit:

To improve performance, the regular expression engine caches all regular expressions in memory. This avoids the need to reparse an expression into high-level byte code each time it is used.

The second time you build your non-compiled Regex, no additional interpreting overhead is incurred. And you get that for free. Even though it sounds faster and all, you probably don’t want to use RegexOptions.Compiled. But what about Regex.CompileToAssembly?

This avoid the pitfalls associated with dynamic compilation by turning your regular expressions into a compiled DLL. There aren’t many articles describing how to do this, but Kent Tegels dug up a few Regex articles with sample code showing how to take advantage of Regex.CompileToAssembly:

It seems ideal – all the advantages of compilation with none of the disadvantages – but it adds one disadvantage of its own: your regular expressions are now written in stone. You can’t change them at runtime, and you have to know what you’re going to do entirely up front. This might be a worthwhile tradeoff at the end of a large project that uses regular expressions extensively, but still... only 30% faster? I’d want some actual benchmark numbers from my application before I could justify the loss of flexibility and the additional file dependency.

Related posts

Parsing Html The Cthulhu Way

Among programmers of any experience, it is generally regarded as A Bad Ideatm to attempt to parse HTML with regular expressions. How bad of an idea? It apparently drove one Stack Overflow user to the brink of madness: You can't parse [X]HTML with regex. Because HTML can&

By Jeff Atwood ·
Comments

The Problem With URLs

URLs are simple things. Or so you'd think. Let's say you wanted to detect an URL in a block of text and convert it into a bona fide hyperlink. No problem, right? Visit my website at http://www.example.com, it's awesome! To locate

By Jeff Atwood ·
Comments

The Visual Studio IDE and Regular Expressions

The Visual Studio IDE supports searching and replacing with regular expressions, right? Sure it does. It's right there in grey and black in the find and replace dialog. Just tick the "use Regular expressions" checkbox and we're off to the races. However, you'

By Jeff Atwood ·
Comments

Regex Performance

I was intrigued by a recent comment from a Microsoft Hotmail developer on the ptifalls they've run into while upgrading Hotmail to .NET 2.0: Regular Expressions can be very expensive. Certain (unintended and intended) strings may cause RegExes to exhibit exponential behavior. We've taken several

By Jeff Atwood ·
Comments

Recent Posts

Stay Gold, America

Stay Gold, America

We are at an unprecedented point in American history, and I'm concerned we may lose sight of the American Dream.

By Jeff Atwood ·
Comments
The Great Filter Comes For Us All

The Great Filter Comes For Us All

With a 13 billion year head start on evolution, why haven’t any other forms of life in the universe contacted us by now? (Arrival is a fantastic movie. Watch it, but don’t stop there – read the Story of Your Life novella it was based on for so much

By Jeff Atwood ·
Comments
I Fight For The Users

I Fight For The Users

If you haven’t been able to keep up with my blistering pace of one blog post per year, I don’t blame you. There’s a lot going on right now. It’s a busy time. But let’s pause and take a moment to celebrate that Elon Musk

By Jeff Atwood ·
Comments
The 2030 Self-Driving Car Bet

The 2030 Self-Driving Car Bet

It’s my honor to announce that John Carmack and I have initiated a friendly bet of $10,000* to the 501(c)(3) charity of the winner’s choice: By January 1st, 2030, completely autonomous self-driving cars meeting SAE J3016 level 5 will be commercially available for passenger use

By Jeff Atwood ·
Comments