Coding Horror

programming and human factors

The Great Browser JavaScript Showdown

In The Day Performance Didn't Matter Any More, I found that the performance of JavaScript improved a hundredfold between 1996 and 2006. If Web 2.0 is built on a backbone of JavaScript, it's largely possible only because of those crucial Moore's Law performance improvements.

But have we hit a performance wall? Is it possible for browsers to run JavaScript significantly faster than they do today? I've always thought that just-in-time optimizing (or even compiling) JavaScript was an unexplored frontier in browser technology. And now the landscape has shifted:

  1. Apple's WebKit team just announced a great new JavaScript benchmark, SunSpider.
  2. The browser market is more competitive than it has been in years, with Opera 9.5, Firefox 3, Safari 3, and IE 8 all vying for the coveted default browser position.

Perhaps browser teams will begin to consider JavaScript performance a competitive advantage. The last time I looked for common JavaScript benchmarks, I came away deeply disappointed. That's why I'm particularly excited by the SunSpider benchmark: it's remarkably well thought out, easy to run, and comprehensive.

It's based on real code that does interesting things; both things that the web apps of today are doing, and more advanced code of the sorts we can expect as web apps become more advanced. Very few of the tests could be classed as microbenchmarks.

It's balanced between different aspects of the JavaScript language -- not dominated by just a small handful of different things. In fact, we collected test cases from all over the web, including from other benchmarks. But at the same time, we avoided DOM tests and stuck to the core JavaScript language itself.

It's super easy to run in the browser or from the command line, so you can test both pure engine performance, and the results you actually get in the browser.

We included statistical analysis so you can see how stable the results you're getting really are.

Maciej Stachowiak, a member of Apple's WebKit team, graciously explained what each subsection of the benchmarks do in the comments:

3dPure JavaScript computations of the kind you might use to do 3d rendering, but without the rendering. This ends up mostly hitting floating point math and array access.
accessArray, object property and variable access.
bitopsBitwise operations, these can be useful for various things including games, mathematical computations, and various kinds of encoding/decoding. It's also the only kind of math in JavaScript that is done as integer, not floating point.
controlflowControl flow constructs (looping, recursion, conditionals). Right now it mostly covers recursion, as the others are pretty well covered by other tests.
cryptoReal cryptography code, mostly covers bitwise operations and string operations.
datePerformance of JavaScript's "date" objects.
mathVarious mathematical type computations.
regexpRegular expressions. Pretty self-explanatory.
stringString processing, including code to generate a giant "tagcloud", extracting compressed JS code, etc.

SunSpider is the best JavaScript benchmark I've seen, something we desperately need in an era where JavaScript is the Lingua Franca of the web. I was so excited, in fact, that I ran some quick benchmarks to compare the four major players in the browser market:

  • Windows Vista 32-bit
  • 4 GB RAM
  • dual-core 3.0 GHz Core 2 Duo CPU
  • all browser extensions disabled (clean install)

Browser JavaScript performance graph, result totals by browser

What surprised me here is that Firefox is substantially slower than IE, once you factor out that wildly anomalous string result. I had to use a beta version of Opera to get something other than invalid (NaN) results for this benchmark, which coincidentally summarizes my opinion of Opera. Great when it works! I expected Opera to do well; it was handily winning JavaScript benchmarks way back in 2005. The new kid on the block, Safari, shows extremely well particularly considering that it is running outside its native OS X environment. Kudos to Apple. Well, except for that whole font thing.

If you're curious how each browser stacked up in each benchmark area, I broke that down, too:

Browser JavaScript performance graph, breakdown by test area

If you need greater detail-- including variances-- you can download my complete set of SunSpider 0.9 results as a text file.

If I've learned anything from the computer industry, it's that competition benefits everyone. Here's hoping that a great JavaScript browser performance showdown spurs the browser teams on to better performance in this increasingly crucial area.

Discussion

Nobody Cares What Your Code Looks Like

In The Problems of Perl: The Future of Bugzilla, Max Kanat-Alexander* laments the state of the Bugzilla codebase:

Once upon a time, Bugzilla was an internal application at Netscape, written in TCL. When it was open-sourced in 1998, Terry (the original programmer), decided to re-write Bugzilla in Perl. My understanding is that he re-wrote it in Perl because a lot of system administrators know Perl, so that would make it easier to get contributors.

In 1998, there were few advanced, object-oriented web scripting languages. In fact, Perl was pretty much it. PHP was at version 3.0, python was at version 1.5, Java was just starting to become well-known, ruby was almost unheard of, and some people were still writing their CGI scripts in C or C++.

Perl has many great features, most of all the number of libraries available and the extreme flexibility of the language. However, Perl would not be my first choice for writing or maintaining a large project such as Bugzilla. The same flexibility that makes Perl so powerful makes it very difficult to enforce code quality standards or to implement modern object-oriented designs.

Since 1998 there have been many advances in programming languages. PHP has decent object-oriented features, python has many libraries and excellent syntax, Java has matured a lot, and Ruby is coming up in the world quickly. Nowadays, almost all of our competitors have one advantage: they are not written in Perl. They can actually develop features more quickly than we can, not because of the number of contributors they have, but because the language they're using allows it. There are at least two bug-trackers that I can think of off the top of my head that didn't even exist in 1998 and were developed rapidly up to a point where they could compete with Bugzilla.

In 1998, Perl was the right choice for a language to re-write Bugzilla in. In 2007, though, having worked with Perl extensively for years on the Bugzilla project, I'd say the language itself is our greatest hindrance. Without taking some action, I'm not sure how many more years Bugzilla can stay alive as a product. Currently, our popularity is actually increasing, as far as I can see. So we shouldn't abandon what we're doing now. But I'm seeing more and more products come into the bug-tracking arena, and I'm not sure that we can stay competitive for more than a few more years if we stick with Perl.

It's a credit to Max that he cares enough about the future of his work to surface these important issues. Perhaps it would make sense to rewrite Bugzilla in a friendlier, more modern language.

Neither Perl nor the circa-1998 Bugzilla codebase have aged particularly well over the last 10 years. I don't think Bugzilla is anyone's favorite bug tracking product. It is utilitarian bordering on downright ugly. But-- and here's the important part-- Bugzilla works. It's actively used today by some of the largest and most famous open source projects on the planet, including the Linux Kernel, Mozilla, Apache, and many others.

I have a friend who works for an extremely popular open source database company, and he says their code is some of the absolute worst he's ever seen. This particular friend of mine is no stranger to bad code-- he's been in a position to see some horrifically bad codebases. Adoption of this open source database isn't slowing in the least because their codebase happens to be poorly written and difficult to troubleshoot and maintain. Users couldn't care less whether the underlying code is pretty. All they care about is whether or not it works. And it must work-- otherwise, why would all these people all over the world be running their businesses on it?

I gave Joel Spolsky a hard time for his Wasabi language boondoggle, but I'm now reconsidering that stance. Fog Creek Software isn't funded by the admiration of other programmers. It's funded by selling their software to customers. And to the customer, the user interface is the application. I might point and laugh at an application written in some crazy hand rolled in-house language. But language choice is completely invisible to potential customers. As long as the customers are happy with the delivered application and sales are solid, who gives a damn what I-- or any other programmers, for that matter-- think?

Sure, we programmers are paid to care what the code looks like. We worry about the guts of our applications. It's our job. We want to write code in friendly, modern languages that make our work easier and less error-prone. We'd love any opportunity to geek out and rewrite everything in the newest, sexiest possible language. It's all perfectly natural.

The next time you're knee deep in arcane language geekery, remember this: nobody cares what your code looks like. Except for us programmers. Yes, well-factored code written in a modern language is a laudable goal. But perhaps we should also focus a bit more on things the customer will see and care about, and less on the things they never will.

* I desperately want to provide full name attribution here, but I was unable to find Max's last name on any of his pages-- which drives me absolutely bonkers (see # 3).

Discussion

Software Registration Keys

Software is digital through and through, and yet there's one unavoidable aspect of software installation that remains thoroughly analog: entering the registration key.

software registration key example #1

software registration key example #2

software registration key example #3

software registration key example #4

The aggravation is intentional. Unique registration keys exist only to prevent piracy. Like all piracy solutions-- short of completely server hosted applications and games, where piracy means you'd have to host your own rogue server-- it's an incomplete client-side solution. How effective is it? One vendor implemented code to detect false registration keys and phone home with some basic information such as the IP address when these false keys are entered. Here's what they found:

Software ConnectivityRatio of pirated
to legitimate keys
no internet connection required45 : 1
occasional internet connection necessary60 : 1
internet must be "always on"110 : 1

I have no idea how reliable this data is. The vendor is never named, and given that the title of the URL is sharewarejustice.com/software-piracy.htm, I'd expect it to be biased. But it is data, and without the registration key concept (and pervasive internet connectivity), we'd have no data whatsoever to quantify how much piracy actually exists. The BSA estimated 35% of all software was pirated in 2006, but it is just that-- an estimate. I'll choose biased data over no data whatsoever, every time.

I don't have a problem with registration keys. You could, in fact, argue that registration key validation actually works. Microsoft recently stated that the piracy rate of Vista is half that of XP, largely due to improvements in their Windows Genuine Advantage program-- Microsoft's global registration key validation service.

As a software developer, I can empathize with Microsoft to a degree. Unless you oppose the very concept of commercial software, there has to be some kind of enforcement in place. The digital nature of software makes it both easy and impersonal for people to avoid paying (note that I did not say "steal"), which is an irresistible combination for many. Unless you provide some disincentives, that's exactly what people will do-- they'll pay nothing for your software.

Microsoft's history with piracy goes way, way back-- all the way back to the original microcomputers. Witness Bill Gates' Open Letter To Hobbyists, written in 1976.

Almost a year ago, Paul Allen and myself, expecting the hobby market to expand, hired Monte Davidoff and developed Altair BASIC. Though the initial work took only two months, the three of us have spent most of the last year documenting, improving and adding features to BASIC. Now we have 4K, 8K, EXTENDED, ROM and DISK BASIC. The value of the computer time we have used exceeds $40,000.

The feedback we have gotten from the hundreds of people who say they are using BASIC has all been positive. Two surprising things are apparent, however, 1) Most of these "users" never bought BASIC (less than 10% of all Altair owners have bought BASIC), and 2) The amount of royalties we have received from sales to hobbyists makes the time spent on Altair BASIC worth less than $2 an hour.

Why is this? As the majority of hobbyists must be aware, most of you steal your software. Hardware must be paid for, but software is something to share. Who cares if the people who worked on it get paid?

Is this fair? One thing you don't do by stealing software is get back at MITS for some problem you may have had. MITS doesn't make money selling software. The royalty paid to us, the manual, the tape and the overhead make it a break-even operation. One thing you do do is prevent good software from being written. Who can afford to do professional work for nothing? What hobbyist can put 3-man years into programming, finding all bugs, documenting his product and distribute for free? The fact is, no one besides us has invested a lot of money in hobby software. We have written 6800 BASIC, and are writing 8080 APL and 6800 APL, but there is very little incentive to make this software available to hobbyists. Most directly, the thing you do is theft.

Although computers have changed radically in the last thirty years, human behavior hasn't. (Alternately, you could argue that the economics of computing and the emergence of an ad-supported software ecosystem have fundamentally changed the rules of the game since 1976. But that's a topic for another blog post.)

I accept that software registration keys are a necessary evil for commercial software, and I resign myself to manually keeping track of them, and keying them in. But why do they have to be so painful? You do realize a human being has to type this stuff in, right? Here are some things that I've seen vendors get wrong with their registration key process:

  1. Using commonly mistaken characters in the key

    Quick! Is that an 'O' or an '0'? A '6' or a 'G'? An 'I' or an 'l'? A 'B' or an '8'? At least have the courtesy to scour your registration key character set of those characters that are commonly mistaken for other characters. And please print the key in a font that minimizes the chances of confusion.

  2. Excessively long keys

    The most rudimentary grasp of mathematics tells us that a conservative 10 character alphanumeric registration key is good for 197 trillion unique users. Even factoring in the pigeonhole principle, we can estimate about 14 million random registration key combinations before we have a 50 percent risk of a collision. So why, then, do software developers insist on 20+ character registration keys? It's ridiculous. Are they planning to sell licenses to every grain of sand on every beach?

  3. Not separating the key into blocks

    Rather than smashing your key into one long string, make it a group of small 4 to 5 characters, separated by a delimiter. It's the same reason phone numbers are listed as 404-555-1212 and not 4045551212: People have an easier time handling and remembering small chunks of information.

  4. Making it difficult to enter the key

    Short of providing every customer a handy USB barcode scanner, at least make the registration key entry form as user friendly as possible:

    • Let the user enter the key in any format. With dashes, without dashes, using spaces, whatever. Be flexible. Accept a variety of formats.
    • Do not provide five input boxes that require us to tab through each one to enter the key. It's death by a thousand tiny textboxes.
    • Tell me as soon as I've entered a bad value in the key. Why should I have to go back and pore over my entry to figure out which letter or number I've screwed up? You're the computer, remember? This is what you're good at.
    • Accept pasting from the clipboard. Once we've installed the software, we'll probably install it again, and nobody likes keying these annoying resgistration keys in more than once. I've seen some clever software that proactively checks the clipboard and enters the key automatically if it finds it there. (Kudos to you, Beyond Compare.)
    • Don't passively-aggressively inform me that "the key you entered appears to be valid." Is it? Or isn't it? What's the point of unique registration keys if you can't be sure? I guess paying customers can't be trusted.

  5. Where's the %*@# key?

    The key is important. Without it we can't install or use the software. So why is it buried in the back of the manual, or on an easy-to-overlook interior edge of the package? Make it easy to find-- and difficult to lose. Provide multiple copies of the key in different locations, maybe even as a peelable sticker we can place somewhere useful. And if the software was delivered digitally, please keep track of our key for us. We're forgetful.

Software registration keys are a disconcerting analog hoop we force users to jump through when using commercial software. Furthermore, registration keys are often the user's first experience with our software-- and first impressions matter. If you're delivering software that relies on registration keys, give that part of the experience some consideration. Any negative feelings generated by an unnecessarily onerous registration key entry process will tend to color users' perception of your software.

Discussion

On The Meaning of "Coding Horror"

In a recent web search, I found the following comment in a programming.reddit.com thread from eight months ago, completely by accident:

I think prog.reddit will continue to move in phases... a couple of days ago, someone complained about a drop-off in Haskell articles, today there were 4 or 5 ... next time Django or Rails does something worth noting, there'll be a plethora of Python/Ruby stuff. Despite its limb-gnawing tedium, Coding Horror will continue to rank high.

I personally think describing what I do here as "limb-gnawing tedium" is a bit hyperbolic. But it made me laugh.

I can understand where the commenter is coming from; the web is chock full of content that absolutely bores me to tears. If I stopped and wrote a comment bemoaning every boring blog post or web page I've ever found, I'd scarcely have time to do anything else. Such comments would also be a bit of a downer for the author, as I'm sure someone is interested in that particular topic. The whole point of putting content on the internet is to find an audience, however tiny that audience might end up being. Maybe you're not a member of the audience, and that's OK.

I try to avoid blogging about blogging because it's such a cliche. And it's boring. However, after digging a bit deeper in the programming.reddit.com comments, I became concerned:

What I don't like about "Coding Horror": the title promises "Daily WTF" style entertainment, but doesn't deliver. "Coding Horror" ought to be about people coding dynamic web pages entirely in SQL, or having some mission critical system written in a cryptic version of csh.

This is a profound misunderstanding. If you're coming here looking for that sort of entertainment, you're bound to be disappointed. I'd like to think this site is the opposite of The Daily WTF.

I apologize for the confusion. Allow me to explain.

First, the literal explanation. The sidebar of Steve McConnell's seminal book, Code Complete, contains a series of icons denoting particular areas. There's a "Hard Data" icon, a "Key Point" icon, and a "Coding Horror" icon.

Excerpt from the book 'Code Complete', page 340

I have to talk a little bit about the influence this book had on me as a young developer.

I graduated from college in 1992, and entered the field of professional software development at that point, at least in terms of being paid to do so. I loved it, but I really had no idea what I was doing. I was a young, inexperienced developer working in small business, where there aren't a lot of other developers to look to as mentors. Nor was the internet a factor; the internet didn't really hit until '95 for most people. I was living in Denver at the time, and I frequented the Tattered Cover, a great independent bookstore. Code Complete was originally published in May 1993; I stumbled across it while browsing the computer book section at the Tattered Cover sometime in 1994. I was floored. Here's this entire book about becoming a professional software developer, written in this surprisingly friendly, humane voice. And it was backed by rational research and real data, not the typical developer "my brain is bigger than yours" chest-thumping.

I had found my muse. Reading Code Complete was a watershed event in my professional life. I read it three times in one week. It immediately became my Joy of Cooking. I didn't even know it existed, but it showed me that if you loved food enough, it was possible to go from being a mere cook to a real chef.

One of the most striking and memorable things about Code Complete, even to this day, is that Coding Horror illustration in the sidebar. Every time I saw it on the page, I would chuckle. Not because of other people's code, mind you. Because of my own code. That was the revelation. You're an amateur developer until you realize that everything you write sucks.

YOU are the Coding Horror.

The minute you realize that, you've crossed the threshold from being an amateur software developer into the realm of the professionals. Half of being a good, competent software developer is realizing that you're going to make tons of mistakes. You will be your own worst enemy almost all the time. It's a lifestyle. You're living it right now. You, me, all of us. The problems start with us. We're all coding horrors. This story from the Tao that Reginald Braithwaite posted is as good an explanation as any:

There was once a monk who would carry a mirror wherever he went. A priest noticed this one day and thought to himself, "This monk must be so preoccupied with the way he looks that he has to carry that mirror all the time. He should not worry about the way he looks on the outside. It's what's inside that counts." So the priest approached the monk and asked "Why do you always carry that mirror?", thinking this would surely prove his guilt.

The monk took the mirror from his bag and pointed it at the priest. He said, "I use it in times of trouble. I look into it and it shows me the source of my problems as well as the solution to my problems."

If you're horrified by what you see in the mirror, you are not alone.

I chose that title for my blog – with explicit permission from Steve – because it's a clever in-joke about becoming a humble professional programmer. That's what I try to do here. I write to learn and explore topics that deal with computers and programming, and because I'm easily bored, the topics I find most interesting tend to apply to a wide audience of programmers. Maybe even people who don't know they're programmers yet. To steal a phrase from the talented Rich Skrenta, I blog to help others and also to learn. As it turns out both are aided by getting folks to actually read the stuff.

But that's not the complete story. I'd be lying if I didn't admit that there's an element of selfishness at work here. I love computers and programming. I love it so much it borders on obsession. When I saw the movie Into The Wild, I was transfixed by the final note written into the margins of Dr. Zhivago by a doomed Christopher McCandless: "Happiness only real when shared."

I realized, that's it. That's it exactly. That is what is so intensely satisfying about writing here. My happiness only becomes real when I share it with all of you.

Discussion

Our Fractured Online Identities

Anil Dash has been blogging since 1999. He's a member of the Movable Type team from the earliest days. As you'd expect from a man who has lived in the trenches for so long, his blog is excellent. It's well worth a visit if you haven't been there already. I was recently reading through his 2002 blog recommendations and marvelling at the hardy few who survived through five long years of the internet. The way I figure, that's equivalent to thirty-five people years.

I also noticed something interesting lodged in the sidebar of his blog. A long list of Anil Dash's many online identities, spread across no less than 29 different websites:

Anil Dash's many online identities

Laurel Krahn created one of the first 30 weblogs back in 1998. Her home page paints a similarly fractured picture of her online identity. I count 21 different websites that represent some part of Laurel:

Laurel Krahn's many online identities

There's no way any one person could truly keep these 20 or 30 websites up to date. So which one of these websites represents the real Laurel Krahn, the real Anil Dash? Or do all these tiny fragments of identity cumulatively sum to a whole? Browsing around their sites, it's fairly easy to determine what is getting the lion's share of attention, and pare away the neglected parts. Still, it's unclear.

I suppose my online identity is similarly fractured, although somewhat less so than Anil and Laurel. I obviously have this primary blog, which represents me professionally. But I also have a twitter stream, which I alternately treat as my inner monologue, a link blog, and as a form of public instant messaging. Then there are my Vertigo blogs, a handful of online games I play semi-frequently, and various other online forums that I regularly participate in for particular special interests. All these things are me.

But which one is the real me? Is my online identity even a reasonable approximation of who I am? I think it could be. What you read here is mostly what you get, minus some corner-case peculiarities that probably aren't interesting to anyone but me (and my wife, but she's bound by law). It's reassuring to have a single central authoritative place that represents me online.

Mostly, I'm just amazed that these veteran bloggers feel they can actually maintain twenty or thirty different facets of their identity across all those disparate websites. I certainly can't. I struggle to write one lousy blog four to five times a week. I'm more interested in shrinking my focus into an ever narrower and sharper point than I am in diluting my effort across dozens of different websites.

There's no right or wrong answer here, of course. You should follow your interests wherever they lead you, and to as many different websites as necessary. But I do think building a strong online identity is an important strategy for distinguishing yourself in an increasingly online world. So choose carefully, and focus on those things that best represent you.

Discussion