Coding Horror

programming and human factors

Microsoft LogParser

Ask yourself this question: what if everything could be queried with SQL? Microsoft's LogParser does just that. It lets you slice and dice a variety of log file types using a common SQL-like syntax. It's an incredibly powerful concept, and the LogParser implementation doesn't disappoint. This architecture diagram from the LogParser documentation explains it better than I could:

logparser_architecture.gif

The excellent forensic IIS log exploration with LogParser article is a good starting point for sample LogParser IIS log queries. Note that I am summarizing just the SQL clauses; I typically output to the console, so the actual, complete commandline would be

logparser "(sql clause)" -rtp:-1

Top 10 items retrieved:

SELECT TOP 10 cs-uri-stem as Url, COUNT(cs-uri-stem) AS Hits
FROM ex*.log
GROUP BY cs-uri-stem
ORDER BY Hits DESC

Top 10 slowest items:

SELECT TOP 10 cs-uri-stem AS Url, MIN(time-taken) as [Min],
AVG(time-taken) AS [Avg], max(time-taken) AS [Max],
count(time-taken) AS Hits
FROM ex*.log
WHERE time-taken < 120000
GROUP BY Url
ORDER BY [Avg] DESC

All Unique Urls retrieved:

SELECT DISTINCT TO_LOWERCASE(cs-uri-stem) AS Url, Count(*) AS Hits
FROM ex*.log
WHERE sc-status=200
GROUP BY Url
ORDER BY Url

HTTP errors per hour:

SELECT date, QUANTIZE(time, 3600) AS Hour,
sc-status AS Status, COUNT(*) AS Errors
FROM ex*.log
WHERE (sc-status >= 400)
GROUP BY date, hour, sc-status
HAVING (Errors > 25)
ORDER BY Errors DESC

HTTP errors ordered by Url and Status:

SELECT cs-uri-stem AS Url, sc-status AS Status, COUNT(*) AS Errors
FROM ex*.log
WHERE (sc-status >= 400)
GROUP BY Url, Status
ORDER BY Errors DESC

Win32 error codes by total and page:

SELECT cs-uri-stem AS Url,
WIN32_ERROR_DESCRIPTION(sc-win32-status) AS Error, Count(*) AS Total
FROM ex*.log
WHERE (sc-win32-status > 0)
GROUP BY Url, Error
ORDER BY Total DESC

HTTP methods (GET, POST, etc) used per Url:

SELECT cs-uri-stem AS Url, cs-method AS Method,
Count(*) AS Total
FROM ex*.log
WHERE (sc-status < 400 or sc-status >= 500)
GROUP BY Url, Method
ORDER BY Url, Method

Bytes sent from the server:

SELECT cs-uri-stem AS Url, Count(*) AS Hits,
AVG(sc-bytes) AS Avg, Max(sc-bytes) AS Max,
Min(sc-bytes) AS Min, Sum(sc-bytes) AS TotalBytes
FROM ex*.log
GROUP BY cs-uri-stem
HAVING (Hits > 100) ORDER BY [Avg] DESC

Bytes sent from the client:

SELECT cs-uri-stem AS Url, Count(*) AS Hits,
AVG(cs-bytes) AS Avg, Max(cs-bytes) AS Max,
Min(cs-bytes) AS Min, Sum(cs-bytes) AS TotalBytes
FROM ex*.log
GROUP BY Url
HAVING (Hits > 100)
ORDER BY [Avg] DESC

There's an entire book about LogParser, and Mike Gunderloy even started an unofficial LogParser fansite.

Here are a few other articles I found that touch on different aspects of LogParser:

Although LogParser is 96.44% awesome, there are a few things that I didn't like about it:

  1. I really, really need a standard deviation function. Min, Max, and Avg are nice but totally inadequate for determining how variable something is.
  2. The graphing output is cool-- but it's also a MS Office dependency. If you try to graph something on a machine without Office installed, you'll get an error.
  3. The automatic detection of column types in CSV files isn't always reliable. This meant I couldn't graph some numeric values in my PerfMon dumps because LogParser decided they were strings. I couldn't find any way to force a column to be detected as a certain type, either.

Of course, the idea of SQL being used to query a bunch of stuff isn't exactly a new one; Microsoft's WQL (WMI Query Language) is similar but more annoying and less powerful. And you'll get tons of hits if you logically extend this concept to querying HTML, too. Just try searching Google for Web Query Language.

Written by Jeff Atwood

Indoor enthusiast. Co-founder of Stack Exchange and Discourse. Disclaimer: I have no idea what I'm talking about. Find me here: http://twitter.com/codinghorror