Excluding Matches With Regular Expressions

Here's an interesting regex problem:

I seem to have stumbled upon a puzzle that evidently is not new, but for which no (simple) solution has yet been found. I am trying to find a way to exclude an entire word from a regular expression search. The regular expression should find and return everything EXCEPT the text string in the search expression.

For example, if the word fox was what I wanted to exclude, and the searched text was:

The quick brown fox jumped over the lazy dog.

... and I used a regular expression of [^"fox"] (which I know is incorrect) (why this doesn't work I don't understand; it would make life SO much easier), then the returned search results would be:

The quick brown jumped over the lazy dog.

Regular expressions are great at matching. It's easy to formulate a regex using what you want to match. Stating a regex in terms of what you don't want to match is a bit harder.

One easy way to exclude text from a match is negative lookbehind:

w+b(?<!bfox)

But not all regex flavors support negative lookbehind. And those that do typically have severe restrictions on the lookbehind, eg, it must be a simple fixed-length expression. To avoid incompatibility, we can restate our solution using negative lookahead:

(?!foxb)bw+

You can test this regex in the cool online JavaScript Regex evaluator. Unfortunately, JavaScript doesn't support negative lookbehind, so if you want to test that one, I recommend RegexBuddy. It's not free, but it's the best regex tool out there by far-- and it keeps getting better with every incremental release.

Related posts

Parsing Html The Cthulhu Way

Among programmers of any experience, it is generally regarded as A Bad Ideatm to attempt to parse HTML with regular expressions. How bad of an idea? It apparently drove one Stack Overflow user to the brink of madness: You can't parse [X]HTML with regex. Because HTML can&

By Jeff Atwood ·
Comments

The Problem With URLs

URLs are simple things. Or so you'd think. Let's say you wanted to detect an URL in a block of text and convert it into a bona fide hyperlink. No problem, right? Visit my website at http://www.example.com, it's awesome! To locate

By Jeff Atwood ·
Comments

The Visual Studio IDE and Regular Expressions

The Visual Studio IDE supports searching and replacing with regular expressions, right? Sure it does. It’s right there in grey and black in the find and replace dialog. Just tick the “use Regular expressions” checkbox and we’re off to the races. However, you’re in for an unpleasant

By Jeff Atwood ·
Comments

Regex Performance

I was intrigued by a recent comment from a Microsoft Hotmail developer on the pitfalls they’ve run into while upgrading Hotmail to .NET 2.0: Regular Expressions can be very expensive. Certain (unintended and intended) strings may cause RegExes to exhibit exponential behavior. We’ve taken several hotfixes for

By Jeff Atwood ·
Comments

Recent Posts

Let's Talk About The American Dream

Let's Talk About The American Dream

A few months ago I wrote about what it means to stay gold — to hold on to the best parts of ourselves, our communities, and the American Dream itself. But staying gold isn’t passive. It takes work. It takes action. It takes hard conversations that ask us to confront

By Jeff Atwood ·
Comments
Stay Gold, America

Stay Gold, America

We are at an unprecedented point in American history, and I'm concerned we may lose sight of the American Dream.

By Jeff Atwood ·
Comments
The Great Filter Comes For Us All

The Great Filter Comes For Us All

With a 13 billion year head start on evolution, why haven’t any other forms of life in the universe contacted us by now? (Arrival is a fantastic movie. Watch it, but don’t stop there – read the Story of Your Life novella it was based on for so much

By Jeff Atwood ·
Comments