Coding Horror

programming and human factors

I'd Consider That Harmful, Too

One of the seminal papers in computer science is Edsger Dijkstra's 1968 paper GOTO Considered Harmful.

For a number of years I have been familiar with the observation that the quality of programmers is a decreasing function of the density of go to statements in the programs they produce. More recently I discovered why the use of the go to statement has such disastrous effects, and I became convinced that the go to statement should be abolished from all "higher level" programming languages (i.e. everything except, perhaps, plain machine code).

The abuse of GOTO is, thankfully, a long forgotten memory in today's modern programming languages. Of course, it's only a minor hazard compared to the COMEFROM statement, but I'm glad to have both of those largely behind us.

So then I typed GOTO 500 -- and here I am!

GOTO isn't all bad, though. It still has some relevance to today's code. Along with many other programmers, I always recommend using guard clauses to avoid arrow code, and I also recommend exiting early from a loop as soon as you find the value you're looking for. What is an early Return, or an early Exit For other than a tightly scoped GOTO?

foreach my $try (@options) {
next unless exists $hash{$try};
do_something($try);
goto SUCCESS;
}
log_failure();
SUCCESS: ...

The publication of such an influential paper in this particular format led to an almost immediate snowclone effect, as documented on Wikipedia:

Frank Rubin published a criticism of Dijkstra's letter in the March 1987 CACM where it appeared as 'GOTO Considered Harmful' Considered Harmful. The May 1987 CACM printed further replies, both for and against, as '"GOTO Considered Harmful" Considered Harmful' Considered Harmful?. Dijkstra's own response to this controversy was titled "On a somewhat disappointing correspondence".

That's easily one of the funniest things I've ever read in Wikipedia. Who says computer scientists don't have a sense of humor? But I digress. Most software developers are probably familiar, at least in passing, with Dijkstra's GOTO Considered Harmful. But here's what they might not know about it:

  1. The paper was originally titled "A Case Against the Goto Statement"; the editor of the CACM at the time, Niklaus Wirth, changed the title to the more inflammatory version we know today.
  2. In order to speed up its publication, the paper was converted into a "Letter to the Editor".

In other words, Wirth poked and prodded the content until it became incendiary, to maximize its impact. The phrase "considered harmful" was used quite intentionally, as documented on the always excellent Language Log:

However, "X considered harmful" was already a well-established journalistic cliche in 1968 -- which is why Wirth chose it. The illlustration below shows the headline of a letter to the New York Times published August 12, 1949: "Rent Control Controversy / Enacting Now of Hasty Legislation Considered Harmful".

Rent Control Controversy / Enacting Now of Hasty Legislation Considered Harmful

I'm sure it's not the earliest example of this phrase used in a headline or title, either -- I chose it only as a convenient illustration of susage a couple of decades before the date of Dijkstra's paper.

Note that this example is also in the title of a slightly cranky letter to the editor - it's probably not an accident that the first example that came to hand of "considered harmful" in a pre-Dijkstra title was of this type.

So when you emulate the "considered harmful" style predicated on the work of these famous computer scientists in 1968, keep that history in mind. You're emulating a slightly cranky letter to the editor. It's frighteningly common-- there are now 28,800 web pages with the exact phrase "considered harmful" in the title.

This leads, perhaps inevitably, to Eric Meyer's "Considered Harmful" Essays Considered Harmful. He points out that choosing this style of dialogue is ultimately counterproductive:

There are three primary ways in which "Considered Harmful" essays cause harm.
  1. The writing of a "considered harmful" essay often serves to inflame whatever debate is in progress, and thus makes it that much harder for a solution to be found through any means. Those who support the view that the essay attacks are more likely to dig in and defend their views by any means necessary, and are less receptive to reasoned debate. By pushing the opposing views further apart, it becomes more likely that the essay will cause a permanent break between opposing views rather than contribute to a resolution of the debate.
  2. "Considered harmful" essays are most harmful to their own causes. The publication of a "considered harmful" essay has a strong tendency to alienate neutral parties, thus weakening support for the point of view the essay puts forth. A sufficiently dogmatic "considered harmful" essay can end a debate in favor of the viewpoint the essay considers harmful.
  3. They've become boring cliches. Nobody really wants to read "considered harmful" essays any more, because we've seen them a thousand times before and didn't really learn anything from them, since we were too busy being annoyed to really listen to the arguments presented.

If you have a point to make, by all means, write a great persuasive essay. If you want to maximize the effectiveness of your criticisms, however, you'll leave "considered harmful" out of your writing. The "considered harmful" technique may have worked for Wirth and Dijkstra, but unless you're planning to become a world famous computer scientist like those guys, I'd suggest leaving it back in 1968 where it belongs.

Discussion

Hardware Assisted Brute Force Attacks: Still For Dummies

Evidently hardware assisted brute force password cracking has arrived:

A technique for cracking computer passwords using inexpensive off-the-shelf computer graphics hardware is causing a stir in the computer security community.

Elcomsoft, a software company based in Moscow, Russia, has filed a US patent for the technique. It takes advantage of the "massively parallel processing" capabilities of a graphics processing unit (GPU) - the processor normally used to produce realistic graphics for video games.

Using an $800 graphics card from nVidia called the GeForce 8800 Ultra, Elcomsoft increased the speed of its password cracking by a factor of 25, according to the company's CEO, Vladimir Katalov. The toughest passwords, including those used to log in to a Windows Vista computer, would normally take months of continuous computer processing time to crack using a computer's central processing unit (CPU). By harnessing a $150 GPU - less powerful than the nVidia 8800 card - Elcomsoft says they can be cracked in just three to five days. Less complex passwords can be retrieved in minutes, rather than hours or days.

GPUs, with their massive built-in paralellism, were built to do things like this. I'm encouraged that we're finally able to harness all that video silicon to do useful things beyond rendering Doom at 60 frames per second with anti-aliasing and anisotropic filtering.

There's a bit more detail on the elecom approach in their one-page PDF. They provide actual numbers there.

Using the "brute force" technique of recovering passwords, it was possible, though time-consuming, to recover passwords from popular applications. For example, the logon password for Windows Vista might be an eight-character string composed of uppercase and lowercase alphabetic characters. There would about 55 trillion (52 to the eighth power) possible passwords. Windows Vista uses NTLM hashing by default, so using a modern dual-core PC you could test up to 10,000,000 passwords per second, and perform a complete analysis in about two months. With ElcomSoft's new technology, the process would take only three to five days, depending upon the CPU and GPU.

Preliminary tests using Elcomsoft Distributed Password Recovery show that the [brute force password cracking] speed has increased by a factor of twenty, simply by hooking up with a $150 video card's onboard GPU. ElcomSoft expects to find similar results as this new technology is incorporated into their password recovery products for Microsoft Office, PGP, and dozens of other popular applications.

It's fun, and it makes for a shocking "Password Cracking Supercomputers On Every Desktop Make Passwords Irrelevant" headline, but password cracking supercomputers on every desktop doesn't mean the end of password-protected civilization as we know it. Let's do the math.

How many passwords can we attempt per second?

Dual Core CPU10,000,000
GPU200,000,000

How many password combinations do we have to try?

528 = 53,459,728,531,456

That's a lot of potential passwords. Let's stop playing Quake Wars for a few days and get cracking:

53,459,728,531,456 /  10,000,000 pps / 60 / 60 / 24 = 61.9 days
53,459,728,531,456 / 200,000,000 pps / 60 / 60 / 24 =  3.1 days

As promised by elecom, that works out to a little over three days at the GPU crack rate, and two months at the CPU crack rate. Oooh. Scary. Worried yet? If so, you shouldn't be. Watch what happens when I add four additional characters to the password:

5212 / 200,000,000 pps / 60 / 60 / 24 =  22,620,197 days

For those of you keeping score at home, with a 12 character password this hardware assisted brute-force attack would take 61,973 years. Even if we increased the brute force attack rate by a factor of a thousand, it would still take 62 years.

Elecom's idea of an 8 character password is awfully convenient, too. Only lowercase and uppercase letters, a total of 52 possible choices per character. Who has passwords without at least one number? Even MySpace users are smarter than that. If you include a number in your 8 character password, or a non-alphanumeric character like "%", attack times increase substantially. Not enough to mitigate the potential attack completely, mind you, but you'd definitely put a serious dent in any brute forcing effort by switching out a character or two.

628 / 200,000,000 pps / 60 / 60 / 24 =  13 days
728 / 200,000,000 pps / 60 / 60 / 24 =  42 days

Personally, I think it's easier to go with a pass phrase than a bunch of random, difficult to remember gibberish characters as a password. Even if your pass phrase is in all lower-case-- a mere 26 possible characters -- that exponent is incredibly potent.

2610 / 200,000,000 pps / 60 / 60 / 24 =  8 days
2612 / 200,000,000 pps / 60 / 60 / 24 =  15 years
2614 / 200,000,000 pps / 60 / 60 / 24 =  10,228 years

By the time you get to a mere 14 characters-- even if they're all lowercase letters-- you can pretty much forget about anyone brute forcing your password. Ever.

So what have we learned?

Brute force attacks, even fancy hardware-assisted brute force attacks, are still for dummies. If this is the best your attackers can do, they're too stupid to be dangerous. Brute forcing is almost always a waste of time, when vastly more effective social vectors and superior technical approaches are readily available.

Hardware-assisted brute force attacks will never be a credible threat. But short, simple passwords are still dangerous. If your password is only 8 alphabet characters, and if it's exposed in a way that allows brute force hardware assisted attack, you could be in trouble. All you need to do to sleep soundly at night (well, at least as far as brute force attacks are concerned) is choose a slightly longer password. It's much safer to think of your security in terms of passphrases instead of passwords. And unlike "secure" 8 character passwords, passphrases are easy to remember, too. Have you considered helping me evangelize passphrases?

Discussion

Virtual Machine Server Hosting

My employer, Vertigo Software, graciously hosted this blog for the last year. But as blog traffic has grown, it has put a noticeable and increasing strain on our bandwidth. Even on an average day, blog traffic consumes a solid 30 percent of our internet connection-- and much more if something happens to be popular. And that's after factoring in all the bandwidth-reducing tricks I could think of.

While I greatly appreciate my employer's generosity, I don't like causing all my coworkers' internet connections to slow to a crawl. So when my friend and co-author Phil Haack mentioned that we could share a dedicated server through a contact of his, I jumped at the chance.

I'm a big believer in virtualization, so I wanted a beefy physical server that could handle running at least four virtual servers. And I wanted it to run a 64-bit host operating system, as 64-bit offers huge performance benefits for servers. Nobody in their right mind should build up a 32-bit server today.

The contact he was referring to works at CrystalTech. And boy, did CrystalTech ever hook us up:

  • Windows Server 2003 R2 x64
  • Quad-core Xeon X3210 @ 2.13 Ghz
  • 4 GB RAM
  • 300 GB RAID-5 array

Not too shabby. It is, of course, an obscene amount of power for our relatively modest needs. Have I mentioned how much I like my new friends at CrystalTech? Or what great deals they have on hosting?

Powered by CrystalTech

But in all seriousness, it's effectively a new sponsor for this blog, so welcome aboard.

I was already hosting this server as a VM, so here's what I did to switch over to completely new hardware:

  1. shut down my VM
  2. compacted and compressed it
  3. transferred it to the new server
  4. booted it up again

All I had to do was change the IP address in the VM and I was up and running as if nothing had changed. That's the easiest server migration I've ever experienced, all thanks to virtualization.

Phil and I are both Windows ecosystem developers, so we went with what we knew. But virtualization provides total flexibility. I could spin up a new Linux server at a moment's notice if I decided to switch this blog over to the LAMP stack. Or I could play with the latest release candidate of Windows Server 2008. And they can all run in parallel, assuming we have enough memory. That's what I love most about virtualization-- the freedom.

Although Phil and I share admin access to the host machine, we have our own private playgrounds in our virtual servers. We're completely isolated from each other's peculiarities and weirdnesses: nothing we do (well, almost nothing) can affect the other person's virtual machine. Reboot? No problem. Install some stupid software I can't stand? Go for it. Format the drive and start over? Don't care. It's your machine. Do whatever.

The only downside to virtual machine server hosting is that it can be difficult to share IPs between virtual machines. CrystalTech has provided us with a block of 6 public IP addresses, so fortunately we don't have to worry about this. One IP is occupied by the host, but that still leaves five IPs for virtual machines of our creation. That's plenty.

But let's say we only had two public IP addresses-- or we wanted to run lots and lots of virtual machines with a small pool of public IP addresses. What then? How could codinghorror.com and haacked.com share the same IP address (and port 80), when they're on two different virtual machines? They clearly can't occupy the same IP.

codinghorror.com   10.0.0.1:80
haacked.com        10.0.0.1:80

On a single physical server, the answer is easy-- virtual hosting, or host header routing. But that requires our websites to live side by side on the same server. Phil and I don't share our wives, so why would we share a server? No offense intended to either of our wives-- or our respective servers-- but sharing is an unacceptable solution. I like you, Phil... but not that much.

If you want two different machines (physical or virtual) to share an IP, it takes some clever trickery. In the Windows ecosystem, that clever trickery often comes in the form of Microsoft's ISA Server. (I'm not sure what the open source equivalent is, but I'm confident it's out there.)

ISA Server acts as our public interface to the world, talking through a public IP address. All DNS entries, and thus HTTP traffic, would be directed to that single public IP address. As our gatekeeper, ISA Server is in a unique position to do lots of cool stuff for us, like firewalling, caching, and so on. But we only care about one particular feature right now: the ability to share an IP address between multiple machines. This is known as a "web rule" in ISA parlance. With appropriate web rules in effect for both of our sites, ISA Server will shuttle the HTTP requests back and forth to the correct private IP addresses based on the host headers. It basically extends the host header routing concepts we saw in Apache and IIS outside the confines of a particular machine.

ISA Server         10.0.0.1:80
codinghorror.com   192.168.0.1:80
haacked.com        192.168.0.2:80

That's one way you can host fifty websites, all running on fifty different machines, with a single public IP address. It's a very clever trick indeed. Unfortunately, ISA Server isn't the simplest of products to configure and administer. I'm glad we have enough public IPs that we don't have to worry about sharing them between multiple machines. But it's definitely something you should be aware of, as virtual servers become increasingly commonplace.. and the pool of available IP addresses continues to dwindle.

Discussion

Let's Play Planning Poker!

One of the most challenging aspects of any software project is estimation-- determining how long the work will take. It's so difficult, some call it a black art. That's why I highly recommend McConnell's book, Software Estimation: Demystifying the Black Art; it's the definitive work on the topic. Anyone running a software project should own a copy. If you think you don't need this book, take the estimation challenge: how good an estimator are you?

How'd you do? If you're like the rest of us, you suck. At estimating, I mean.

Given the uncertainty and variability around planning, it's completely appropriate that there's a game making the rounds in agile development circles called Planning Poker.

Planning Poker card deck

There are even cards for it, which makes it feel a lot more poker-ish in practice. And like poker, the stakes in software development are real money-- although we're usually playing with someone else's money. If you have a distributed team, card games may seem like a cruel joke. But there's a nifty web-based implementation of Planning Poker, too.

Planning Poker is a form of the estimation technique known as Wideband Delphi. Wideband Delphi was created by the RAND corporation in 1968. I assume by Delphi they're referring to the oracle at Delphi. If anything says "we have no clue how long this will take", it's naming your estimation process after ancient, gas-huffing priestesses who offered advice in the form of cryptic riddles. It doesn't exactly inspire confidence, but that's probably a good expectation to set, given the risks of estimation.

Planning Poker isn't quite as high concept as Wideband Delphi, but the process is functionally identical:

  1. Form a group of no more than 10 estimators and a moderator. The product owner can participate, but cannot be an estimator.
  2. Each estimator gets a deck of cards: 0, 1, 2, 3, 5, 8, 13, 20, 40, and 100.
  3. The moderator reads the description of the user story or theme. The product owner answers brief questions from the estimators.
  4. Every estimator selects an estimate card and places it face down on the table. After all estimates are in, the cards are flipped over.
  5. If the estimates vary widely, the owners of the high and low estimates discuss the reasons why their estimates are so different. All estimators should participate in the discussion.
  6. Repeat from step 4 until the estimates converge.

There's nothing magical here; it's the power of group dialog and multiple estimate averaging, delivered in an approachable, fun format.

Planning Poker is a good option, particularly if your current estimation process resembles throwing darts at a printout of a Microsoft Project Gantt chart. But the best estimates you can possibly produce are those based on historical data. Steve McConnell has a whole chapter on this, and here's his point:

If you haven't previously been exposed to the power of historical data, you can be excused for not currently having any data to use for your estimates. But now that you know how valuable historical data is, you don't have any excuse not to collect it. Be sure that when you reread this chapter next year, you're not still saying "I wish I had some historical data!"

In other words, if you don't have historical data to base your estimates on, begin collecting it as soon as possible. There are tools out there that can help you do this. Consider the latest version of Fogbugz; its marquee feature is evidence-based scheduling. Armed with the right historical evidence, you can..

Predict when your software will ship. Here you can see we have a 74% chance of shipping by December 17th.

fogbugz 6: predict ship dates

Determine which developers are on the critical path. Some developers are better at estimating than others; you can shift critical tasks to developers with a proven track record of meeting their estimates.

fogbugz 6: developer ship dates

See how accurate an estimator you really are. How close are your estimates landing to the actual time the task took?

fogbugz 6: developer history

See your predicted ship dates change over time. We're seeing the 5%, 50%, and 95% estimates on the same graph here. Notice how they converge as development gets further along; this is evidence that the project will eventually complete, and you won't be stuck in some kind of Duke Nukem Forever limbo.

fogbugz 6: ship date over time

Witness, my friends, the power of historical data on a software project.

The dirty little secret of evidence based scheduling is that collecting this kind of historical data isn't trivial. Garbage in, garbage out. It takes discipline and concerted effort to enter the effort times-- even greatly simplified versions-- and to keep them up to date as you're working on tasks. Fogbugz does its darndest to make this simple, but your team has to buy into the time tracking philosophy for it to work.

You don't have to use Fogbugz. But however you do it, I urge you to begin capturing historical estimation data, if you're not already. It's a tremendous credit to Joel Spolsky that he made this crucial feature the centerpiece of the new Fogbugz. I'm not aware of any other software lifecycle tools that go to such great lengths to help you produce good estimates.

Planning Poker is a reasonable starting point. But the fact that two industry icons, Joel Spolsky and Steve McConnell, are both hammering home the same point isn't a coincidence. Historical estimate data is fundamental to the science of software engineering. Over time, try to reduce your reliance on outright gambling, and begin basing your estimates on real data. Without some kind of institutional estimation memory-- without appreciating the power of historical data-- you're likely to keep repeating the same estimation errors over and over.

Discussion

Are Features The Enemy?

Mark Minasi is mad as hell, and he's not going to take it any more. In his online book The Software Conspiracy, he examines in great detail the paradox I struggled with yesterday-- new features are used to sell software, but they're also the primary reason that software spoils over time.

If a computer magazine publishes a roundup of word processors, the central piece of that article will be the "feature matrix," a table showing what word processing programs have which features. With just a glance, the reader can quickly see which word processors have the richest sets of features, and which have the least features. You can see an imaginary example in the following table:
MyWord 2.1BugWord 2.0SmartWords 3.0
Can boldface textX X 
Runs on the Atari 520 X 
Automatically indents first line of a paragraph X  
Includes game for practicing touch typing  XX
Lets you design your own characters  XX
Generates document tables of contentsX   
Can do rotating 3D bullet points in color  XX
Can do bulleted listsX   
Supports Cyrillic symbol set X  
Includes Malaysian translater X X

It looks like BugWord 2.0 is the clear value -- there are lots more check boxes in its column. However, a closer look reveals that it lacks some very basic and useful word processing features, which MyWord 2.1 has. But the easy-to-interpret visual nature of a feature matrix seems to mean that the magazine's message is: Features are good, and the more the better. As Internet Week senior executive editor Wayne Rash, a veteran of the computer press, says, "Look at something like PC Magazine, you'll see this huge comparison chart. Every conceivable feature any product could ever do shows up, and if a package has that particular feature, then there's a little black dot next to that product. What companies want is to have all the little black dots filled in because it makes their software look better."

Mark maintains that software companies give bugs in their existing software a low priority, while developing new features for the next version is considered critically important. As a result, quality suffers. He trots out this Bill Gates quote as a prime example:

There are no significant bugs in our released software that any significant number of users want fixed... The reason we come up with new versions is not to fix bugs. It's absolutely not. It's the stupidest reason to buy a new version I ever heard... And so, in no sense, is stability a reason to move to a new version. It's never a reason.

It's hard to argue with the logic. Customers will pay for new features. But customers will never pay companies to fix bugs in their software. Unscrupulous software companies can exploit this by fixing bugs in the next version, which just so happens to be jam packed full of exciting new features that will induce customers to upgrade.

Unlike Mark, I'm not so worried about bugs. All software has bugs, and if you accrue enough of them, your users will eventually revolt. Yes, the financial incentives for fixing bugs are weak, but the market seems to work properly when faced with buggy software.

A much deeper concern, for me, is the subtle, creeping feature-itis that destroys my favorite software. It's the worst kind of affliction-- a degenerative disease that sets in over time. As I've regrettably discovered in many, many years of using software, adding more features rarely results in better software. The commercial software market, insofar as it forces vendors to engage in bullet point product feature one-upsmanship, could be actively harming the very users it is trying to satisfy.

And the worst part, the absolute worst part, is that customers are complicit in the disease, too. Customers ask for those new features. And customers will use the dreaded "feature matrix" as a basis for comparing what applications they'll buy. Little do they know that they're slowly killing the very software that they love.

Today, as I was starting up WinAmp, I was blasted by this upgrade dialog.

WinAmp update dialog

Do I care about any of these new features? No, not really. Album art sounds interesting, but the rest are completely useless to me. I don't have to upgrade, of course, and there's nothing forcing me to upgrade. Yet. My concern here isn't for myself, however. It's for WinAmp. For every new all-singing, all-dancing feature, WinAmp becomes progressively slower, even larger, and more complicated. Add enough well-intentioned "features", and eventually WinAmp will destroy itself.

Sometimes, I wonder if the current commercial software model is doomed. The neverending feature treadmill it puts us on almost always results in extinction. Either the application eventually becomes so bloated and ineffective that smaller, nimbler competitors replace it, or the application slowly implodes under its own weight. In either case, nothing is truly fixed; the cycle starts anew. Something always has to give in the current model. Precious few commercial software packages are still around after 10 years, and most of the ones that are feel like dinosaurs.

Perhaps we should stop blindly measuring software as a bundle of features, as some kind of endless, digital all-you-can eat buffet. Instead, we could measure software by results-- how productive or effective it makes us at whatever task we're doing. Of course, measuring productivity and results is hard, whereas counting bullets on a giant feature matrix is brainlessly easy. Maybe that's exactly the kind of cop-out that got us where we are today.

Discussion