Coding Horror

programming and human factors

The Computer Performance Shell Game

The performance of any computer is akin to a shell game.

shell game

The computer performance shell game, also known as "find the bottleneck", is always played between these four resources:

  • CPU
  • Disk
  • Network
  • Memory

At any given moment, your computer is waiting for some operation to complete on one of these resources. But which one: CPU, memory, disk, or network? If you're interested in performance, the absolute first thing you have to do is determine which of these bottlenecks is currently impeding performance – and eliminate it. At which point the bottleneck often shifts to some other part of the system, far too rapidly for your eye to see. Just like a real shell game.

So the art of performance monitoring is, first and foremost, getting your computer to tell you what's going on in each of these areas – so you can make your best guess where the pea is right now.

My previous performance drug of choice was Task Manager, or its vastly more sophisticated bigger brother, Process Explorer. But now that I've discovered the Reliability and Performance Monitor, I can't stop watching it. It is my crystal meth. While the previous tools were solid enough, they both had one glaring flaw. They only showed CPU load and memory usage. Those are frequently performance bottlenecks, to be sure, but they're only part of the story.

The Reliability and Performance Monitoring tool, while continuing the fine Microsoft product tradition of absolutely freaking horrible names, is new to Windows Vista and Windows Server 2008. And it rocks.

reliability-and-performance-monitor-overview

Right off the bat you get a nice summary of what's going on in your computer performance shell game, with an overview graph and high water marks for CPU, Disk, Network, and Memory, along with scaled numbers. Eyeball this one key set of graphs and you can usually get a pretty good idea which part of your computer is working overtime.

There are also collapsible detail sections for each graph. On these detail sections, bear in mind the numbers are all live, and the default sort orders tend to bring the most active things to the top. And they stay at the top until they're no longer using that resource, at which point they disappear. The detail sections are a quick way to drill down into each resource and see what programs and processes are monopolizing it at any given time.

The CPU detail section gives you a moving average of CPU usage, which is much saner than Task Manager's always shifting numbers. Admittedly, this section isn't radically different than taskman – and it's functionally identical to the Unix top command. But the moving average alone is surprisingly helpful in avoiding obsessing over rapid peaks and valleys.

Reliability and Performance Monitor, CPU detail

The Disk detail section shows which processes are reading and writing to disk, for what filenames/paths, and how long it's taking to service those requests – in real time. I generally alternate between read and write sort order here, although sometimes response time can be informative as well.

Reliability and Performance Monitor, disk detail

The Network detail section shows which processes are sending the most data over the network right now. On a public website, this gives you an at-a-glance breakdown of which IP addresses are hitting you the hardest. In fact, while checking this, I just laid down another IP ban for some random IP that was scraping the heck out of our site.

Reliability and Performance Monitor, network detail

The Memory detail section shows the five most essential metrics for memory usage in real time. Hard Faults are, of course, forced reads from disk into memory – something you want to keep a close eye on. And Working Set is the best general indicator of how much memory a process is actively using to do its thing.

Reliability and Performance Monitor, network detail

The computer performance shell game is nothing new; it is as old as computing itself. And it is a deeply satisfying game for those of us who love this stuff.

I thought I knew how to play it, until I discovered the Reliability and Performance Monitor. Now that I have a utility like this to let me suss out exactly where that performance pea is, I realize how much I was missing.

Now, on to three card monte. Watch my hands closely!

Discussion

HTML Validation: Does It Matter?

The web is, to put it charitably, a rather forgiving place. You can feed web browsers almost any sort of HTML markup or JavaScript code and they'll gamely try to make sense of what you've provided, and render it the best they can. In comparison, most programming languages are almost cruelly unforgiving. If there's a single character out of place, your program probably won't compile, much less run. This makes the HTML + JavaScript environment a rather unique -- and often frustrating -- software development platform.

But it doesn't have to be this way. There are provisions and mechanisms for validating your HTML markup through the official W3C Validator. Playing with the validator underscores how deep that forgiveness by default policy has permeated the greater web. Dennis Forbes recently ran a number of websites through the validator, including this one, with predictably bad results:

FAIL - http://www.reddit.com - 36 errors as XHTML 1.0 Transitional. EDIT: Rechecked Reddit, and now it's a PASS
FAIL - http://www.slashdot.org - 167 errors as HTML 4.01 Strict
FAIL - http://www.digg.com - 32 errors as XHTML 1.0 Transitional
FAIL - http://www.cnn.com - 40 errors as HTML 4.01 Transitional (inferred as no doctype was specified)
FAIL - http://www.microsoft.com - 193 errors as XHTML 1.0 Transitional
FAIL - http://www.google.com - 58 errors as HTML 4.01 Transitional
FAIL - http://www.flickr.com - 34 errors as HTML 4.01 Transitional
FAIL - http://ca.yahoo.com - 276 errors as HTML 4.01 Strict
FAIL - http://www.sourceforge.net - 65 errors as XHTML 1.0 Transitional
FAIL - http://www.joelonsoftware.com - 33 errors as XHTML 1.0 Strict
FAIL - http://www.stackoverflow.com - 58 errors as HTML 4.01 Strict
FAIL - http://www.dzone.com - 165 errors as XHTML 1.0 Transitional
FAIL - http://www.codinghorror.com/blog/ - 51 errors as HTML 4.01 Transitional
PASS - http://www.w3c.org - no errors as XHTML 1.0 Strict
PASS - http://www.linux.com - no errors as XHTML 1.0 Strict
PASS - http://www.wordpress.com - no errors as XHTML 1.0 Transitional

In short, we live in malformed world. So much so that you begin to question whether validation matters at all. If you see this logo on a site, what does it mean to you? How will it affect your experience on that website? As a developer? As a user?

w3c-validation-button-large.png

We just went through the exercise of validating Stack Overflow's HTML. I almost immediately ruled out the idea of validating as XHTML, because I vehemently agree with James Bennett:

The short and sweet reason is simply this: XHTML offers no compelling advantage -- to me -- over HTML, but even if it did it would also offer increased complexity and uncertainty that make it unappealing to me.

The whole HTML validation exercise is questionable, but validating as XHTML is flat-out masochism. Only recommended for those that enjoy pain. Or programmers. I can't always tell the difference.

Anyway, we validated as the much saner HTML 4.01 strict, and even then I'm not sure it was worth the time we spent. So many of these validation rules feel arbitrary and meaningless. And, what's worse, some of them are actively harmful. For example, this is not allowed in HTML strict:

<a href="http://www.example.com/" target="_blank">foo</a>

That's right, target, a perfectly harmless attribute for links that you want to open in a different browser tab/window, is somehow verboten in HTML 4.01 strict. There's an officially supported workaround, but it's only implemented by Opera, so in effect .. there is no workaround.

In order to comply with the HTML 4.01 strict validator, you need to remove that target attribute and replace it with JavaScript that does the same thing. So, immediately I began to wonder: Is anybody validating our JavaScript? What about our CSS? Is anyone validating the DOM manipulations that JavaScript performs on our HTML? Who validates the validator? Why can't I stop thinking about zebras?

Does it really matter if we render stuff this way..

<td width="80">
<br/>

.. versus this way?

<td style="width:80px">
<br>

I mean, who makes up these rules? And for what reason?

I couldn't help feeling that validating as HTML 4.01 strict, at least in our case, was a giant exercise in to-may-to versus to-mah-to, punctuated by aggravating changes that we were forced to make for no practical benefit. (Also, if you have a ton of user-generated content like we do, you can pretty much throw any fantasies of 100% perfect validation right out the window.)

That said, validation does have its charms. There were a few things that the validation process exposed in our HTML markup that were clearly wrong -- an orphaned tag here, and a few inconsistencies in the way we applied tags there. Mark Pilgrim makes the case for validation:

I am not claiming that your page, once validated, will automatically render flawlessly in every browser; it may not. I am also not claiming that there aren't talented designers who can create old-style "Tag Soup" pages that do work flawlessly in every browser; there certainly are. But the validator is an automated tool that can highlight small but important errors that are difficult to track down by hand. If you create valid markup most of the time, you can take advantage of this automation to catch your occasional mistakes. But if your markup is nowhere near valid, you'll be flying blind when something goes wrong. The validator will spit out dozens or even hundreds of errors on your page, and finding the one that is actually causing your problem will be like finding a needle in a haystack.

There is some truth to this. Learning the rules of the validator, even if you don't agree with them, teaches you what the official definition of "valid" is. It grounds your markup in reality. It's sort of like passing your source code through an ultra-strict lint validation program, or setting your compiler to the strictest possible warning level. Knowing the rules and boundaries helps you define what you're doing, and gives you legitimate ammunition for agreeing or disagreeing. You can make an informed choice, instead of a random "I just do this and it works" one.

After jumping through the HTML validation hoops ourselves, here's my advice:

  1. Validate your HTML. Know what it means to have valid HTML markup. Understand the tooling. More information is always better than less information. Why fly blind?
  2. Nobody cares if your HTML is valid. Except you. If you want to. Don't think for a second that producing perfectly valid HTML is more important than running your website, delivering features that delight your users, or getting the job done.

But the question remains: does HTML Validation really matter? Yes. No. Maybe. It depends. I'll tell you the same thing my father told me: take my advice and do as you please.

Discussion

Procrastination and the Bikeshed Effect

The book Producing Open Source Software: How to Run a Successful Free Software Project is a fantastic reference for anyone involved in a software project – whether you're running the show or not.

Producing Open Source Software: How to Run a Successful Free Software Project

In addition to the dead-tree edition, the book is available in an appropriately open source free format at the official website. The entire book is great, and worth a thorough read, even if you think open source is communism.

My favorite chapter is the one on communication. While important on any software project, communication is especially vital on open source projects, which are "bewilderingly diverse both in audiences and communication mechanisms." One particular pitfall of open source projects is that, if you don't manage the project carefully, you can tend to attract developers who are more interested in discussion than writing code. It's a subtle but pernicious problem – communication gone wrong.

Although discussion can meander in any topic, the probability of meandering goes up as the technical difficulty of the topic goes down. After all, the greater the technical difficulty, the fewer participants can really follow what's going on. Those who can are likely to be the most experienced developers, who have already taken part in such discussions thousands of times before, and know what sort of behavior is likely to lead to a consensus everyone can live with.

Thus, consensus is hardest to achieve in technical questions that are simple to understand and easy to have an opinion about, and in "soft" topics such as organization, publicity, funding, etc. People can participate in those arguments forever, because there are no qualifications necessary for doing so, no clear ways to decide (even afterward) if a decision was right or wrong, and because simply outwaiting other discussants is sometimes a successful tactic.

The principle that the amount of discussion is inversely proportional to the complexity of the topic has been around for a long time, and is known informally as the Bikeshed Effect.

We've struggled with this on Stack Overflow, too. The broad soft questions tend to get much more interest and attention than the narrow, technical coding questions that we originally intended the site for. We've made adjustments, but it's an unavoidable aspect of group dynamics. Who are we kidding? It's fun to discuss what color the bikeshed should be painted. Everyone has an opinion about their favorite color scheme.

shed painting

What many people don't realize is that the bikeshed effect is, in fact, a form of procrastination. And it can suck in highly technical developers, along with everyone else.

The psychologists handed out questionnaires to a group of students and asked them to respond by e-mail within three weeks. All the questions had to do with rather mundane tasks like opening a bank account and keeping a diary, but different students were given different instructions for answering the questions. Some thought and wrote about what each activity implied about personal traits: what kind of person has a bank account, for example. Others wrote simply about the nuts and bolts of doing each activity: speaking to a bank officer, filling out forms, making an initial deposit, and so forth. The idea was to get some students thinking abstractly and others concretely. Then the psychologists waited. And in some cases, waited and waited. They recorded all the response times to see if there was a difference between the two groups, and indeed there was a significant difference.

The findings, reported in Psychological Science, a journal of the Association for Psychological Science, were very clear. Even though all of the students were being paid upon completion, those who thought about the questions abstractly were much more likely to procrastinate – and in fact some never got around to the assignment at all. By contrast, those who were focused on the how, when and where of doing the task e-mailed their responses much sooner, suggesting that they hopped right on the assignment rather than delaying it.

This is one reason why I'm so down on architecture astronauts. I find that the amount of discussion on a software feature is inversely proportional to its value. Sure, have some initial discussion to figure out your direction, but the sooner you can get away from airy abstractions, and down to the nuts and bolts of building the damn thing, the better off you – and your project – will be.

Put another way, what's the hardest thing you have to do every day? Deciding what to ignore, so you can stop procrastinating and get stuff done. The next time you feel yourself getting drawn into a protracted bikeshed discussion, consider: shouldn't you be building something instead?

Discussion

The Promise and Peril of Jumbo Frames

We sit at the intersection of two trends:

  1. Most home networking gear, including routers, has safely transitioned to gigabit ethernet.
  2. The generation, storage, and transmission of large high definition video files is becoming commonplace.

If that sounds like you, or someone you know, there's one tweak you should know about that can potentally improve your local network throughput quite a bit -- enabling Jumbo Frames.

The typical UDP packet looks something like this:

udp packet diagram

But the default size of that data payload was established years ago. In the context of gigabit ethernet and the amount of data we transfer today, it does seem a bit.. anemic.

The original 1,518-byte MTU for Ethernet was chosen because of the high error rates and low speed of communications. If a corrupted packet is sent, only 1,518 bytes must be re-sent to correct the error. However, each frame requires that the network hardware and software process it. If the frame size is increased, the same amount of data can be transferred with less effort. This reduces CPU utilization (mostly due to interrupt reduction) and increases throughput by allowing the system to concentrate on the data in the frames, instead of the frames around the data.

I use my beloved energy efficient home theater PC as an always-on media server, and I'm constantly transferring gigabytes of video, music, and photos to it. Let's try enabling jumbo frames for my little network.

The first thing you'll need to do is update your network hardware drivers to the latest versions. I learned this the hard way, but if you want to play with advanced networking features like Jumbo Frames, you need the latest and greatest network hardware drivers. What was included with the OS is unlikely to cut it. Check on the network chipset manufacturer's website.

Once you've got those drivers up to date, look for the Jumbo Frames setting in the advanced properties of the network card. Here's what it looks like on two different ethernet chipsets:

gigabit jumbo marvell yukon advanced settings   gigabit jumbo realtek advanced settings

That's my computer, and the HTPC, respectively. I was a little disturbed to notice that neither driver recognizes exactly the same data payload size. It's named "Jumbo Frame" with 2KB - 9KB settings in 1KB increments on the Realtek, and "Jumbo Packet" with 4088 or 9014 settings on the Marvell. I know that technically, for jumbo frames to work, all the networking devices on the subnet have to agree on the data payload size. I couldn't tell quite what to do, so I set them as you see above.

(I didn't change anything on my router / switch, which at the moment is the D-Link DGL-4500; note that most gigabit switches support jumbo frames, but you should always verify with the manufacturer's website to be sure.)

I then ran a few tests to see if there was any difference. I started with a simple file copy.

Default network settings

gigabit jumbo frames disabled file copy results

Jumbo Frames enabled

gigabit jumbo frames enabled file copy results

My file copy went from 47.6 MB/sec to 60.0 MB/sec. Not too shabby! But this is a very ad hoc sort of testing. Let's see what the PassMark Network Benchmark has to say.

Default network settings

gigabit jumbo frames disabled, throughput graph

Jumbo Frames enabled

gigabit jumbo frames enabled, throughput graph

This confirms what I saw with the file copy. With jumbo frames enabled, we go from 390,638 kilobits/sec to 477,927 kilobits/sec average. A solid 20% improvement.

Now, jumbo frames aren't a silver bullet. There's a reason jumbo frames are never enabled by default: some networking equipment can't deal with the non-standard frame sizes. Like all deviations from default settings, it is absolutely possible to make your networking worse by enabling jumbo frames, so proceed with caution. This SmallNetBuilder article outlines some of the pitfalls:

1) For a large frame to be transmitted intact from end to end, every component on the path must support that frame size.

The switch(es), router(s), and NIC(s) from one end to the other must all support the same size of jumbo frame transmission for a successful jumbo frame communication session.

2) Switches that don't support jumbo frames will drop jumbo frames.

In the event that both ends agree to jumbo frame transmission, there still needs to be end-to-end support for jumbo frames, meaning all the switches and routers must be jumbo frame enabled. At Layer 2, not all gigabit switches support jumbo frames. Those that do will forward the jumbo frames. Those that don't will drop the frames.

3) For a jumbo packet to pass through a router, both the ingress and egress interfaces must support the larger packet size. Otherwise, the packets will be dropped or fragmented.

If the size of the data payload can't be negotiated (this is known as PMTUD, packet MTU discovery) due to firewalls, the data will be dropped with no warning, or "blackholed". And if the MTU isn't supported, the data will have to be fragmented to a supported size and retransmitted, reducing throughput.

In addition to these issues, large packets can also hurt latency for gaming and voice-over-IP applications. Bigger isn't always better.

Still, if you regularly transfer large files, jumbo frames are definitely worth looking into. My tests showed a solid 20% gain in throughput, and for the type of activity on my little network, I can't think of any downside.

Discussion

File Compression in the Multi-Core Era

I've been playing around a bit with file compression again, as we generate some very large backup files daily on Stack Overflow.

We're using the latest 64-bit version of 7zip (4.64) on our database server. I'm not a big fan of more than dual core on the desktop, but it's a no brainer for servers. The more CPU cores the merrier! This server has two quad-core CPUs, a total of 8 cores, and I was a little disheartened to discover that neither RAR nor 7zip seemed to make much use of more than 2.

Still, even if it does only use 2 cores to compress, the 7zip algorithm is amazingly effective, and has evolved over the last few years to be respectably fast. I used to recommend RAR over Zip, but given the increased efficiency of 7zip and the fact that it's free and RAR isn't, it's the logical choice now.

Here are some quick tests I performed compressing a single 4.76 GB database backup file. This was run on a server with dual quad-core 2.5 GHz Xeon E5420 CPUs.

7zipfastest5 min14 MB/sec973 MB
7zipfast7 min11 MB/sec926 MB
7zipnormal34 min2.5 MB/sec752 MB
7zipmaximum41 min2.0 MB/sec714 MB
7zipultra48 min1.7 MB/sec698 MB

For those of you who are now wondering, wow, if 7zip does this well on maximum and ultra, imagine how it'd do on ultra-plus, don't count on it. There's a reason most compression programs default to certain settings as "normal". Above these settings, results tend to fall off a cliff; beyond that sweet spot, you tend to get absurdly tiny increases in compression ratio in exchange for huge order of magnitude increases in compression time.

Now watch what happens when I switch 7zip to use the bzip2 compression algorithm:

7zip with bzip2 selected

We'll compress that same 4.76 GB file, on the same machine:

bzip2fastest2 min36 MB/sec1092 MB
bzip2fast2.5 min29 MB/sec1011 MB
bzip2normal3.5 min22 MB/sec989 MB
bzip2maximum7 min12 MB/sec987 MB
bzip2ultra21 min4 MB/sec986 MB

Why is bzip2 able to work so much faster than 7zip? Simple:

7zip algorithm CPU usage

7zip multithreaded cpu usage

bzip2 algorithm CPU usage

bzip2-multithreaded-cpu-usage.png

Bzip2 uses more than 2 CPU cores to parallelize its work. I'm not sure what the limit is, but the drop-down selector in the 7zip GUI allows up to 16 when the bzip2 algorithm is chosen. I used 8 for the above tests, since that's how many CPU cores we have on the server.

Unfortunately, bzip2's increased speed is sort of moot at high compression levels. The difference between normal, maximum, and ultra compression is a meaningless 0.06 percent. It scales beautifully in time, but hardly at all in space. That's a shame, because that's exactly where you'd like to spend the speed increase of paralellization. Eking out a percent of size improvement could still make sense, depending on the circumstances:

total time = compression time + n * (compressed file size / network speed + decompression time)

For instance, if you compress a file to send it over a network once, n equals one and compression time will have a big influence. If you want to post a file to be downloaded many times, n is big so long compression times will weigh less in the final decision. Finally, slow networks will do best with a slow but efficient algorithm, while for fast networks a speedy, possibly less efficient algorithm is needed.

On the other hand, the ability to compress a 5 GB source file to a fifth of its size in two minutes flat is pretty darn impressive. Still, I can't help wondering how fast the 7zip algorithm would be if it was rewritten and parallelized to take advantage of more than 2 CPU cores, too.

Discussion