This code eventually made it into one of my CodeProject articles. An eagle-eyed CodeProject reader noted that, while my code worked for gzip compression, it failed miserably for websites that use deflate compression. This is case of be careful what you ask for:
Dim wc As New Net.WebClient '-- google will not gzip the content if the User-Agent header is missing! wc.Headers.Add("User-Agent", strHttpUserAgent) wc.Headers.Add("Accept-Encoding", "gzip,deflate") '-- download the target URL into a byte array Dim b() As Byte = wc.DownloadData(strUrl)
99% of the time, you'll get a gzipped array of bytes back from that request. For whatever reason, deflate compression is extremely rare on the open internet. The same reader also helpfully provided a URL that uses deflate: Redline Networks. So that was my test case. Although SharpZipLib supports deflate compression, I had difficulty getting this to work using provided the inflater stream class. And since it's such a rare case, I couldn't find any working code samples.
In desperation-- my OCD prohibits me from letting that last 1% case go-- I turned to the only relevant google result I could find, which happens to be on the SharpZipLib community forum. Jfreilly quickly provided an answer within a day! Problem solved. He also maintains a very nice SharpZip Library FAQ. Kudos to you, sir.
''' <summary> ''' decompresses a compressed array of bytes ''' via the specified HTTP compression type ''' </summary> Private Function Decompress(ByVal b() As Byte, _ ByVal CompressionType As HttpContentEncoding) As Byte() Dim s As Stream Select Case CompressionType Case HttpContentEncoding.Deflate s = New Zip.Compression.Streams.InflaterInputStream( _ New MemoryStream(b), _ New Zip.Compression.Inflater(True)) Case HttpContentEncoding.Gzip s = New GZip.GZipInputStream(New MemoryStream(b)) Case Else Return b End Select Dim ms As New MemoryStream Const intChunkSize As Integer = 2048 Dim intSizeRead As Integer Dim unzipBytes(intChunkSize) As Byte While True intSizeRead = s.Read(unzipBytes, 0, intChunkSize) If intSizeRead > 0 Then ms.Write(unzipBytes, 0, intSizeRead) Else Exit While End If End While s.Close() Return ms.ToArray End Function
There is also a mysterious, third kind of HTTP compression, compress. Ok, it's not all that mysterious, but nobody seems to use it. What's up with that?