If You Like Regular Expressions So Much, Why Don't You Marry Them?

All right... I will!

I'm continually amazed how useful regular expressions are in my daily coding. I'm still working on the MhtBuilder refactoring, and I needed a function to convert all URLs in a page of HTML from relative to absolute:

''' <summary>
''' converts all relative url references
'''    href="myfolder/mypage.htm"
''' into absolute url references
'''    href="http://mywebsite/myfolder/mypage.htm"
''' </summary>
Private Function ConvertRelativeToAbsoluteRefs(ByVal html As String) As String
Dim r As Regex
Dim urlPattern As String = _
"(?<attrib>shref|ssrc|sbackground)s*?=s*?" & _
"(?<delim1>[""']{0,2})(?!#|http|ftp|mailto|javascript)" & _
"/(?<url>[^""'>]+)(?<delim2>[""']{0,2})"
Dim cssPattern As String = _
"@imports+?(url)*['""(]{1,2}" & _
"(?!http)s*/(?<url>[^""')]+)['"")]{1,2}"
'-- href="/anything" to href="http://www.web.com/anything"
r = New Regex(urlPattern, _
RegexOptions.IgnoreCase Or RegexOptions.Multiline)
html = r.Replace(html, "${attrib}=${delim1}" & _HtmlFile.UrlRoot & "/${url}${delim2}")
'-- href="anything" to href="http://www.web.com/folder/anything"
r = New Regex(urlPattern.Replace("/", ""), _
RegexOptions.IgnoreCase Or RegexOptions.Multiline)
html = r.Replace(html, "${attrib}=${delim1}" & _HtmlFile.UrlFolder & "/${url}${delim2}")
'-- @import(/anything) to @import url(http://www.web.com/anything)
r = New Regex(cssPattern, _
RegexOptions.IgnoreCase Or RegexOptions.Multiline)
html = r.Replace(html, "@import url(" & _HtmlFile.UrlRoot & "/${url})")
'-- @import(anything) to @import url(http://www.web.com/folder/anything)
r = New Regex(cssPattern.Replace("/", ""), _
RegexOptions.IgnoreCase Or RegexOptions.Multiline)
html = r.Replace(html, "@import url(" & _HtmlFile.UrlFolder & "/${url})")
Return html
End Function

Each regex is repeated because I have to resolve relative URLs starting with forward slashes to the webroot first--and then all remaining relative URLs to the current web folder.

One of the BCL team recently recommended pretty-printing regular expressions, eg, using whitespace to make regexes more readable with RegexOptions.IgnorePatternWhitespace. I agree completely. We do this all the time with SQL. I can think of a half-dozen tools that will block of SQL and pretty format it-- but I am not aware of any regex tools that offer this functionality. I guess I'll email the author of Regexbuddy and see what he has to say.

And here's an interesting bit of trivia: did you know that the ASP.NET page parser uses regular expressions?

Read more

Stay Gold, America

We are at an unprecedented point in American history, and I'm concerned we may lose sight of the American Dream.

By Jeff Atwood · · Comments

The Great Filter Comes For Us All

With a 13 billion year head start on evolution, why haven't any other forms of life in the universe contacted us by now? (Arrival is a fantastic movie. Watch it, but don't stop there - read the Story of Your Life novella it was based on

By Jeff Atwood · · Comments

I Fight For The Users

If you haven't been able to keep up with my blistering pace of one blog post per year, I don't blame you. There's a lot going on right now. It's a busy time. But let's pause and take a moment

By Jeff Atwood · · Comments

The 2030 Self-Driving Car Bet

It's my honor to announce that John Carmack and I have initiated a friendly bet of $10,000* to the 501(c)(3) charity of the winner’s choice: By January 1st, 2030, completely autonomous self-driving cars meeting SAE J3016 level 5 will be commercially available for passenger

By Jeff Atwood · · Comments