I'm continually amazed how useful regular expressions are in my daily coding. I'm still working on the MhtBuilder refactoring, and I needed a function to convert all URLs in a page of HTML from relative to absolute:
''' <summary> ''' converts all relative url references ''' href="myfolder/mypage.htm" ''' into absolute url references ''' href="http://mywebsite/myfolder/mypage.htm" ''' </summary> Private Function ConvertRelativeToAbsoluteRefs(ByVal html As String) As String Dim r As Regex Dim urlPattern As String = _ "(?<attrib>shref|ssrc|sbackground)s*?=s*?" & _ "(?<delim1>[""']{0,2})(?!#|http|ftp|mailto|javascript)" & _ "/(?<url>[^""'>]+)(?<delim2>[""']{0,2})" Dim cssPattern As String = _ "@imports+?(url)*['""(]{1,2}" & _ "(?!http)s*/(?<url>[^""')]+)['"")]{1,2}" '-- href="/anything" to href="http://www.web.com/anything" r = New Regex(urlPattern, _ RegexOptions.IgnoreCase Or RegexOptions.Multiline) html = r.Replace(html, "${attrib}=${delim1}" & _HtmlFile.UrlRoot & "/${url}${delim2}") '-- href="anything" to href="http://www.web.com/folder/anything" r = New Regex(urlPattern.Replace("/", ""), _ RegexOptions.IgnoreCase Or RegexOptions.Multiline) html = r.Replace(html, "${attrib}=${delim1}" & _HtmlFile.UrlFolder & "/${url}${delim2}") '-- @import(/anything) to @import url(http://www.web.com/anything) r = New Regex(cssPattern, _ RegexOptions.IgnoreCase Or RegexOptions.Multiline) html = r.Replace(html, "@import url(" & _HtmlFile.UrlRoot & "/${url})") '-- @import(anything) to @import url(http://www.web.com/folder/anything) r = New Regex(cssPattern.Replace("/", ""), _ RegexOptions.IgnoreCase Or RegexOptions.Multiline) html = r.Replace(html, "@import url(" & _HtmlFile.UrlFolder & "/${url})") Return html End Function
Each regex is repeated because I have to resolve relative URLs starting with forward slashes to the webroot first--and then all remaining relative URLs to the current web folder.
One of the BCL team recently recommended pretty-printing regular expressions, eg, using whitespace to make regexes more readable with RegexOptions.IgnorePatternWhitespace. I agree completely. We do this all the time with SQL. I can think of a half-dozen tools that will block of SQL and pretty format it-- but I am not aware of any regex tools that offer this functionality. I guess I'll email the author of Regexbuddy and see what he has to say.
And here's an interesting bit of trivia: did you know that the ASP.NET page parser uses regular expressions?