

Parsing Html The Cthulhu Way

Among programmers of any experience, it is generally regarded as A Bad Ideatm to attempt to parse HTML with regular expressions. How bad of an idea? It apparently drove one Stack Overflow user to the brink of madness: You can't parse [X]HTML with regex. Because HTML can&

By Jeff Atwood ·


The Problem With URLs

URLs are simple things. Or so you'd think. Let's say you wanted to detect an URL in a block of text and convert it into a bona fide hyperlink. No problem, right? Visit my website at, it's awesome! To locate

By Jeff Atwood ·


The Visual Studio IDE and Regular Expressions

The Visual Studio IDE supports searching and replacing with regular expressions, right? Sure it does. It’s right there in grey and black in the find and replace dialog. Just tick the “use Regular expressions” checkbox and we’re off to the races. However, you’re in for an unpleasant

By Jeff Atwood ·


Regex Performance

I was intrigued by a recent comment from a Microsoft Hotmail developer on the pitfalls they’ve run into while upgrading Hotmail to .NET 2.0: Regular Expressions can be very expensive. Certain (unintended and intended) strings may cause RegExes to exhibit exponential behavior. We’ve taken several hotfixes for

By Jeff Atwood ·


I Heart Cheatsheets

I’m a huge fan of Beagle Brothers style cheat sheets, because nothing promotes the illusion of mastery like a densely packed chart of obscure reference information: Just throw some of those babies up on your walls and people will know that they’re clearly dealing with a coding genius!

By Jeff Atwood ·


Excluding Matches With Regular Expressions

Here's an interesting regex problem []: > I seem to have stumbled upon a puzzle that evidently is not new, but for which no (simple) solution has yet been found. I am trying to find a way to exclude an entire

By Jeff Atwood ·


If You Like Regular Expressions So Much, Why Don’t You Marry Them?

All right... I will! I’m continually amazed how useful regular expressions are in my daily coding. I’m still working on the MhtBuilder refactoring, and I needed a function to convert all URLs in a page of HTML from relative to absolute: <summary> converts all relative url

By Jeff Atwood ·


To Compile or Not To Compile

I am currently in the middle of a way-overdue refactoring of MhtBuilder, which uses regular expressions extensively. I noticed that I had sort of mindlessly added RegexOptions.Compiled all over the place. It says “compiled” so it must be faster, right? Well, like so many other things, that depends: In

By Jeff Atwood ·

regular expressions

Regex use vs. Regex abuse

I’m a huge fan of regular expressions; they’re the Swiss army knife of web-era development tools. I’m always finding new places to use them in my code. Although other developers I work with may be uncomfortable with regular expressions at first, I eventually convert them to the

By Jeff Atwood ·


RegexBuddy and Friends

Jan Goyvaerts released a new version of RegexBuddy today. I’ve talked about this tool before – it’s easily the best Regex tool available. Some feature highlights for this version are: * Built in GREP tool * Visual regular expression debugging support * Full unicode support The GREP tool is an unexpected bonus;

By Jeff Atwood ·


Java vs. .NET RegEx performance

I was intrigued when I saw a cryptic reference to “the lackluster RegEx performance in .NET 1.1” on Don Park’s blog. Don referred me to this page, which displays some really crazy benchmark results from a Java regex test class – calling C#’s regex support “20 times slower

By Jeff Atwood ·


My Buddy, Regex

I generally don’t subscribe to the UNIX religion, but there is one area where I am an unabashed convert: regular expressions. Yeah, the syntax is a little scary, but for processing strings, nothing is more effective. The RegEx is the power drill of the programmer’s toolkit: not appropriate

By Jeff Atwood ·