Building Mht Files from URLs revisited

I finally finished updating my CodeProject article, Convert any URL to a MHTML archive using native .NET code. It’s based on RFC standard 2557, aka Multipart MIME Message (MHTML web archive). You may also know it as that crazy File, Save As, “Web Archive, Single File” menu option in Internet Explorer. It’s basically a way to package an entire web page as a (mostly) functional single file that can be emailed, stored in a database, or what have you. Lots of interesting possibilities, including quick and dirty offline functionality for ASP.NET websites using loopback HTTP requests.

This was a truly painful total rewrite, but it offers tons of new functionality:

  • Completely rewritten!
  • Autodetection of content encoding (e.g., international web pages), tested against multi-language websites
  • Now correctly decompresses both types of HTTP compression
  • Supports completely in-memory operation for server-side use, or on-disk storage for client use
  • Now works on web pages with frames and iframes, using recursive retrieval
  • HTTP authentication and HTTP Proxy support
  • Allows configuration of browser ID string to retrieve browser-specific content
  • Basic cookie support (needs enhancement and testing)
  • Much improved regular expressions used for parsing HTTP
  • Extensive use of VB.NET 2005 style XML comments throughout

If you’re interested, you can download the VS.NET 2003 solution from my blog until the CodeProject site gets updated. Here’s a screenshot of the demo app packaged with the Mht.Builder class:

screenshot of Mht.Builder demo app

Related posts

Why Ruby?

I've been a Microsoft developer for decades now. I weaned myself on various flavors of home computer Microsoft Basic, and I got my first paid programming gigs in Microsoft FoxPro, Microsoft Access, and Microsoft Visual Basic. I have seen the future of programming, my friends, and it is

By Jeff Atwood ·
Comments

Donating $5,000 to .NET Open Source

Way back in June of last year, I promised to donate a portion of my advertising revenue back to the community: I will be donating a significant percentage of my ad revenue back to the programming community. The programming community is the reason I started this blog in the first

By Jeff Atwood ·
Comments

Do Not Buy This Book

A few friends and I just wrote a book together: The ASP.NET 2.0 Anthology: 101 Essential Tips, Tricks & Hacks. I met K. Scott Allen, Jon Galloway, and Phil Haack through their excellent blogs. That online friendship carried over into real life. We always thought it'd

By Jeff Atwood ·
Comments

Defining Open Source

As I mentioned two weeks ago, my plan is to contribute $10,000 to the .NET open source ecosystem. $5,000 from me, and a matching donation of $5,000 from Microsoft. There's only two ground rules so far: 1. The project must be written in .NET managed

By Jeff Atwood ·
Comments

Recent Posts

Stay Gold, America

Stay Gold, America

We are at an unprecedented point in American history, and I'm concerned we may lose sight of the American Dream.

By Jeff Atwood ·
Comments
The Great Filter Comes For Us All

The Great Filter Comes For Us All

With a 13 billion year head start on evolution, why haven’t any other forms of life in the universe contacted us by now? (Arrival is a fantastic movie. Watch it, but don’t stop there – read the Story of Your Life novella it was based on for so much

By Jeff Atwood ·
Comments
I Fight For The Users

I Fight For The Users

If you haven’t been able to keep up with my blistering pace of one blog post per year, I don’t blame you. There’s a lot going on right now. It’s a busy time. But let’s pause and take a moment to celebrate that Elon Musk

By Jeff Atwood ·
Comments
The 2030 Self-Driving Car Bet

The 2030 Self-Driving Car Bet

It’s my honor to announce that John Carmack and I have initiated a friendly bet of $10,000* to the 501(c)(3) charity of the winner’s choice: By January 1st, 2030, completely autonomous self-driving cars meeting SAE J3016 level 5 will be commercially available for passenger use

By Jeff Atwood ·
Comments