My Buddy, Regex
I generally don't subscribe to the UNIX religion, but there is one area where I am an unabashed convert: regular expressions. Yeah, the syntax is a little scary, but for processing strings, nothing is more effective. The RegEx is the power drill of the programmer's toolkit: not appropriate for every job, but the go-to tool for a lot of common jobs. And what could be more common than the humble string, particularly in this day and age of HTML, XML, SOAP, and other plain text formats? Most modern development languages have complete Regular Expression support-- even in the IDE for things like search and replace.
Over the last four years I've experimented with a number of commercial, freeware, and even homegrown RegEx tools. In the .NET era, I started with Expresso, and I recently found out about Regulator, which is hands down the most impressive free RegEx tool I've encountered to date. But that was before I met my new best friend, RegexBuddy:
I belatedly realized after I created this screenshot I may have accidentally picked the complicated "run away screaming" example. Great for me as an intermediate regex user, but not so great for introducing people to the miracle of RegEx. So let me apologize by way of explanation: this regex captures all valid HTML 4.0 tags. It also exploits a very powerful feature called named captures-- see the ?<element> and ?<attr> highlighted in that tannish-brown? In .NET you can refer to those matches with a very simple, logical syntax:
Dim mc As MatchCollection = reg.Matches(strHTML) Dim m As Match For Each m In mc m.Groups("element").ToString m.Groups("attr").ToString Next
The one unique, killer feature that RegexBuddy has is super fast, real-time highlighting of all possible matches as you type the regular expression. That has always been my complaint about regex composition: it's difficult to tell beforehand what the effect of your regex will be until you "run" it and browse all the matches. With RegexBuddy, you don't have to-- just type and watch. No running required. But that's not the only great feature: the plain text regex decomposition and the pre-built regex library are also best of breed. Needless to say, highly recommended, and currently my preferred tool. It's not free, but TANSTAAFL.
Once you come to grips with the basics of regular expressions, you'll want a handy cheat sheet of the syntax. The best one I've found is VisiBone's JavaScript foldout. There's also an online version. All the VisiBone stuff is super cool, and brings back warm memories of those incredible Beagle Brothers posters I had for the Apple //. However, the information density does get a little ridiculous on the VisiBone cards, so I'd go with the foldouts or the wall charts, unless you enjoy squinting a lot. If you just can't get enough, and you want to learn about the thrilling history of RegEx and understand how they work under the hood (try to envison me stifling a yawn at this point) there's also the O'Reilly book.
You may not even need to know the syntax if you can drop prebuilt regexes into your code. Why build what you can steal? There are a number of sites with growing prebuilt repositories of regular expressions:
- http://www.3leaf.com/resources/articles/regex.aspx
- http://www.regular-expressions.info/
- http://www.regexlib.com/ (available as a web service!)