Coding Horror

programming and human factors

Building Mht Files from URLs revisited

I finally finished updating my Convert any URL to a MHTML archive using native .NET code CodeProject article. It's based on RFC standard 2557, aka Multipart MIME Message (MHTML web archive). You may also know it as that crazy File, Save As, "Web Archive, Single File" menu option in Internet Explorer. It's basically a way to package an entire web page as a (mostly) functonal single file that can be emailed, stored in a database, or what have you. Lots of interesting possibilities, including quick and dirty offline functionality for ASP.NET websites using loopback HTTP requests.

This was a truly painful total rewrite, but it offers tons of new functionality:

  • Completely rewritten!
  • Autodetection of content encoding (eg, international web pages), tested against multi-language websites
  • Now correctly decompresses both types of HTTP compression
  • Supports completely in-memory operation for server-side use, or on-disk storage for client use
  • Now works on web pages with frames and iframes, using recursive retrieval
  • HTTP authentication and HTTP Proxy support
  • Allows configuration of browser ID string to retrieve browser-specific content
  • Basic cookie support (needs enhancement and testing)
  • Much improved regular expressions used for parsing HTTP
  • Extensive use of VB.NET 2005 style XML comments throughout

If you're interested, you can download the VS.NET 2003 solution from my blog until the CodeProject site gets updated. Here's a screenshot of the demo app packaged with the Mht.Builder class:

screenshot of Mht.Builder demo app

Written by Jeff Atwood

Indoor enthusiast. Co-founder of Stack Exchange and Discourse. Disclaimer: I have no idea what I'm talking about. Find me here: http://twitter.com/codinghorror