If You Like Regular Expressions So Much, Why Don’t You Marry Them?
All right... I will!
I’m continually amazed how useful regular expressions are in my daily coding. I’m still working on the MhtBuilder refactoring, and I needed a function to convert all URLs in a page of HTML from relative to absolute:
<summary> converts all relative url references href="myfolder/mypage.htm" into absolute url references href="http://mywebsite/myfolder/mypage.htm" </summary> Private Function ConvertRelativeToAbsoluteRefs(ByVal html As String) As String Dim r As Regex Dim urlPattern As String = _ "(?<attrib>shref|ssrc|sbackground)s*?=s*?" & _ "(?<delim1>[\"'']{0,2})(?!#|http|ftp|mailto|javascript)" & _ "/(?<url>[^\"'>]+)(?<delim2>[\"'']{0,2})" Dim cssPattern As String = _ "@imports+?(url)*['\"(]{1,2}" & _ "(?!http)s*/(?<url>[^\"')]+)['\")]{1,2}" '-- href="/anything" to href="http://www.web.com/anything" r = New Regex(urlPattern, _ RegexOptions.IgnoreCase Or RegexOptions.Multiline) html = r.Replace(html, "${attrib}=${delim1}" & _HtmlFile.UrlRoot & "/${url}${delim2}") '-- href="anything" to href="http://www.web.com/folder/anything" r = New Regex(urlPattern.Replace("/", ""), _ RegexOptions.IgnoreCase Or RegexOptions.Multiline) html = r.Replace(html, "${attrib}=${delim1}" & _HtmlFile.UrlFolder & "/${url}${delim2}") '-- @import(/anything) to @import url(http://www.web.com/anything) r = New Regex(cssPattern, _ RegexOptions.IgnoreCase Or RegexOptions.Multiline) html = r.Replace(html, "@import url(" & _HtmlFile.UrlRoot & "/${url})") '-- @import(anything) to @import url(http://www.web.com/folder/anything) r = New Regex(cssPattern.Replace("/", ""), _ RegexOptions.IgnoreCase Or RegexOptions.Multiline) html = r.Replace(html, "@import url(" & _HtmlFile.UrlFolder & "/${url})") Return html End Function
Each Regex is repeated because I have to resolve relative URLs starting with forward slashes to the webroot first – and then all remaining relative URLs to the current web folder.
One of the BCL team recently recommended pretty-printing regular expressions, e.g., using whitespace to make Regexes more readable with RegexOptions.IgnorePatternWhitespace. I agree completely. We do this all the time with SQL. I can think of a half-dozen tools that will block of SQL and pretty format it – but I am not aware of any Regex tools that offer this functionality. I guess I’ll email the author of Regexbuddy and see what he has to say.
And here’s an interesting bit of trivia: did you know that the ASP.NET page parser uses regular expressions?