Coding Horror

programming and human factors

How Not to Conduct an Online Poll

Inside the Precision Hack is a great read. It's all about how the Time Magazine World's Most Influential People poll was gamed. But the actual hack itself is somewhat less impressive when you start digging into the details.

Here's the voting UI for the Time poll in question.

time 100 poll entry

Casting a vote submits a HTTP GET in the form of:

http://www.timepolls.com/contentpolls/Vote.do
?pollName=time100_2009&id=1883924&rating=1

Where id is a number associated with the person being voted for, and rating is how influential you think that person is from 1 to 100. Simple enough, but Time's execution was .. less than optimal.

In early stages of the poll, Time.com didn't have any authentication or validation -- the door was wide open to any client that wanted to stuff the ballot box.

Soon afterward, it was discovered that the Time.com Poll didn't even range check its parameters to ensure that the ratings fell within the 1 to 100 range

The outcome of the 2009 Time 100 World's Most Influential People poll isn't that important in the big scheme of things, but it's difficult to understand why a high profile website would conduct an anonymous worldwide poll without even the most basic of safeguards in place. This isn't high security; this is web 101. Any programmer with even a rudimentary understanding of how the web works would have thought of these exploits immediately.

Without any safeguards, wannabe "hackers" set out to game the poll in every obvious way you can think of. Time eventually responded -- with all the skill and expertise of ... a team who put together the world's most insecure online poll.

Shortly afterward, Time.com changed the protocol to attempt to authenticate votes by requiring a key be appended to the poll submission URL. The key consisted of an MD5 hash of the URL + a secret word (aka 'the salt'). [hackers eventually] discovered that the salt [..] was poorly hidden in Time.com's voting flash application. With the salt extracted, the autovoters were back online, rocking the vote.

So-called secret poorly hidden on the client: check!

Another challenge faced by the autovoters was that if you voted for the same person more often than once every 13 seconds, your IP would be banned from voting. However, it was noticed that you could cycle through votes for other candidates during those 13 seconds. The autovoters quickly adapted to take advantage of this loophole, interleaving up-votes for moot with down-votes for the competition -- ensuring that no candidate received a vote more frequently than once every 13 seconds, maximizing the voting leverage.

Sloppy, incomplete IP throttling: check!

At this point, here's the mental image I had of the web developers running the show at time.com:

a bunch of clowns

Remember my advice from design for evil?

When good is dumb, evil will always triumph.

Well, here's your proof. I'm not sure they come any dumber than these clowns.

The article goes on to document how the "hackers" exploited these truck sized holes in the time.com online voting system to not only put moot on top, but spell out a little message, too, for good measure:

Looking at the first letters of each of the top 21 leading names in the poll we find the message "marblecake, also the game". The poll announces (perhaps subtly) to the world, that the most influential are not the Obamas, Britneys or the Rick Warrens of the world, the most influential are an extremely advanced intelligence: the hackers.

It's a nice sentiment, I suppose. But is it really a precision hack when your adversaries are incompetent? If you want to read about a real hack -- one that took "extremely advanced intelligence" in the face of a nearly unstoppable adversary -- try the black sunday hack. Now that's a hack.

Update: A second article describing more Time poll hilarity. Now with 100% more CAPTCHA!

Discussion

Exception-Driven Development

If you're waiting around for users to tell you about problems with your website or application, you're only seeing a tiny fraction of all the problems that are actually occurring. The proverbial tip of the iceberg.

iceberg.jpg

Also, if this is the case, I'm sorry to be the one to have to tell you this, but you kind of suck at your job -- which is to know more about your application's health than your users do. When a user informs me about a bona fide error they've experienced with my software, I am deeply embarrassed. And more than a little ashamed. I have failed to see and address the issue before they got around to telling me. I have neglected to crash responsibly.

The first thing any responsibly run software project should build is an exception and error reporting facility. Ned Batchelder likens this to putting an oxygen mask on yourself before you put one on your child:

When a problem occurs in your application, always check first that the error was handled appropriately. If it wasn't, always fix the handling code first. There are a few reasons for insisting on this order of work:

  1. With the original error in place, you have a perfect test case for the bug in your error handling code. Once you fix the original problem, how will you test the error handling? Remember, one of the reasons there was a bug there in the first place is that it is hard to test it.
  2. Once the original problem is fixed, the urgency for fixing the error handling code is gone. You can say you'll get to it, but what's the rush? You'll be like the guy with the leaky roof. When it's raining, he can't fix it because it's raining out, and when it isn't raining, there's no leak!

You need to have a central place that all your errors are aggregated, a place that all the developers on your team know intimately and visit every day. On Stack Overflow, we use a custom fork of ELMAH.

stackoverflow exception log

We monitor these exception logs daily; sometimes hourly. Our exception logs are a de-facto to do list for our team. And for good reason. Microsoft has collected similar sorts of failure logs for years, both for themselves and other software vendors, under the banner of their Windows Error Reporting service. The resulting data is compelling:

When an end user experiences a crash, they are shown a dialog box which asks them if they want to send an error report. If they choose to send the report, WER collects information on both the application and the module involved in the crash, and sends it over a secure server to Microsoft.

The mapped vendor of a bucket can then access the data for their products, analyze it to locate the source of the problem, and provide solutions both through the end user error dialog boxes and by providing updated files on Windows Update.

Broad-based trend analysis of error reporting data shows that 80% of customer issues can be solved by fixing 20% of the top-reported bugs. Even addressing 1% of the top bugs would address 50% of the customer issues. The same analysis results are generally true on a company-by-company basis too.

Although I remain a fan of test driven development, the speculative nature of the time investment is one problem I've always had with it. If you fix a bug that no actual user will ever encounter, what have you actually fixed? While there are many other valid reasons to practice TDD, as a pure bug fixing mechanism it's always seemed far too much like premature optimization for my tastes. I'd much rather spend my time fixing bugs that are problems in practice rather than theory.

You can certainly do both. But given a limited pool of developer time, I'd prefer to allocate it toward fixing problems real users are having with my software based on cold, hard data. That's what I call Exception-Driven Development. Ship your software, get as many users in front of it as possible, and intently study the error logs they generate. Use those exception logs to hone in on and focus on the problem areas of your code. Rearchitect and refactor your code so the top 3 errors can't happen any more. Iterate rapidly, deploy, and repeat the proces. This data-driven feedback loop is so powerful you'll have (at least from the users' perspective) a rock stable app in a handful of iterations.

Exception logs are possibly the most powerful form of feedback your customers can give you. It's feedback based on shipping software that you don't have to ask or cajole users to give you. Nor do you have to interpret your users' weird, semi-coherent ramblings about what the problems are. The actual problems, with stack traces and dumps, are collected for you, automatically and silently. Exception logs are the ultimate in customer feedback.

carnage4life: getting real feedback from customers by shipping is more valuable than any amount of talking to or about them beforehand

Am I advocating shipping buggy code? Incomplete code? Bad code? Of course not. I'm saying that the sooner you can get your code out of your editor and in front of real users, the more data you'll have to improve your software. Exception logs are a big part of that; so is usage data. And you should talk to your users, too. If you can bear to.

Your software will ship with bugs anyway. Everyone's software does. Real software crashes. Real software loses data. Real software is hard to learn, and hard to use. The question isn't how many bugs you will ship with, but how fast can you fix those bugs? If your team has been practicing exception-driven development all along, the answer is -- why, we can improve our software in no time at all! Just watch us make it better!

And that is sweet, sweet music to every user's ears.

Discussion

Is Open Source Experience Overrated?

I'm a big advocate of learning on the battlefield. And that certainly includes what may be the most epic battle of them all: open source software.

Contribute to an open-source project. There are thousands, so pick whatever strikes your fancy. But pick one and really dig in, become an active contributor. Absolutely nothing is more practical, more real, than working collaboratively with software developers all over the globe from all walks of life.

If you're looking to polish your programming chops, what could possibly be better, more job-worthy experience than immersing yourself in a real live open source software project? There are thousands, maybe hundreds of thousands, and a few of them have arguably changed the world.

Unfortunately, that wasn't what happened for one particular open source developer. In an anonymous email to me, he related his experiences:

I'm a programmer with 14 years of experience both inside academics and in commercial industry currently looking for work. In both my cover letters and my resume I indicate that I am the architect of a couple of open source Java projects where the code, design and applications were available on the web.

One company seemed impressed with my enthusiasm for the job but it was part of their policy to provide coding tests. This seemed perfectly reasonable and I did it by using the first solution I thought about. When I got to the phone interview, the guy spent about five minutes telling me how inefficient my coding solution was and that they were not very impressed. Then I asked whether he had looked at the open source projects I mentioned. He said no - but it seems his impression was already set based on my performance in the coding test. The coding test did not indicate what criteria they were using for evaluation but my solution seemed to kill the interview.

In another call, I was talking with a recruiter who was trying to place someone for a contract Java development assignment. I told her that most of my recent work was open source and that she could inspect it if she wanted to assess my technical competence. Five minutes later she phoned back and said I appeared to lack any recent commercial experience. I had demonstrable open source applications that used the technologies they wanted, but it didn't appear to matter.

With yet another recruiter I told him that even years ago when I had worked on commercial projects, before I went back to school, the proprietary nature of my jobs prevented me from mentioning the specifics about a lot of what I did. The badge of commercial software experience didn't necessarily prove either my technical competence or my relative contribution to the projects. What my experience of working in industry long ago did teach me was how to fill out a time sheet and estimate time for deliverables. But this experience would seem a bit dated now for recruiters.

That's a terrible interview track record for the open source experience that I advocated so strongly. He continues:

I think it's important that I try to see their point of view. A lot of open source projects are probably poorly written and made in response to a neat idea rather than to requirements from some user community. In academia, the goal for development is often more about publishing papers than establishing a user base. Industry people sometimes have the view (sometimes justified and sometimes not) that open source developers who emerge from academic projects lack practical skills. I don't necessarily claim my open source code is the best in the world but it works, it's documented and it's available for scrutiny. One of the reasons I worked so hard on open source projects was to make job interviews easier. By providing prospective employers with large samples of publically available working code, I thought I would give them something more useful to think about than my performance on a particular coding test or whether the acronyms in the job skills matched my "years spent". I am very aware of the hype behind open source. I've heard it, lived it and even spun some of it myself. But sometimes it's good to take a sobering reality check -- is open-source experience overrated?

It's disheartening to hear so many prospective employers completely disregard experience on open source projects. It's a part of your programming portfolio, and any company not even willing to take a cursory look at your portfolio before interviewing you is already suspect. This reflects poorly on the employers. I'm not sure I'd want to work at a place where a programmers' prior body of work is treated as inconsequential.

On the other hand, perhaps the choice of open source project matters almost as much as the programming itself. How many open source projects labor away in utter obscurity, solving problems that nobody cares about, problems so incredibly narrow that the authors are the only possible beneficiaries? Just as commercial software can't possibly exist without customers, perhaps open source experience is only valid if you work on a project that attains some moderate level of critical mass and user base. Remember, shipping isn't enough. Open source or not, if you aren't building software that someone finds useful, if you aren't convincing at least a small audience of programmers that your project is worthwhile enough to join --

Then what are you really doing?

Discussion

Death to the Space Infidels!

Ah, spring. What a wonderful time of year. A time when young programmers' minds turn to thoughts of ... neverending last-man-standing filibuster arguments about code formatting.

Naturally.

And there is no argument more evergreen than the timeless debate between tabs and spaces.

On defaultly-configured Unix systems, and on ancient dumb terminals and teletypes, the tradition has been for the TAB character to mean move to the right until the current column is a multiple of 8. This is also the default in the two most popular Unix editors, Emacs and vi.

In many Windows and Mac editors, the default interpretation is the same, except that multiples of 4 are used instead of multiples of 8.

A third interpretation is for the ASCII TAB character to mean indent to the next tab stop, where the tab stops are set arbitrarily: they might not necessarily be equally distanced from each other. Most word processors can do this; Emacs can do this. I don't think vi can do this, but I'm not sure.

With these three interpretations, the ASCII TAB character is essentially being used as a compression mechanism, to make sequences of SPACE-characters take up less room in the file.

So, then, the question: should code* be indented with spaces..

visual-studio-space-indent

or with tabs?

visual-studio-tabs-indent

According to Cyrus, there's a third option: an unholy melding of both tabs and spaces. Apparently you can use tab for primary indentation alignment and then spaces on top of that for detail alignment. Like so:

visual-studio-space-and-tab-indent.png

This way, in theory at least, the level of indent can be adjusted dynamically without destroying alignment. But I'm more inclined to think of it as combining all the complexity and pitfalls of both approaches, myself.

OK, so maybe you're an enlightened coder. You've moved beyond mere earthbound issues like tabs vs. spaces on your personal path to code nirvana. Perhaps you have some kind of fancy auto-formatter that runs on every checkin. Or, maybe you're using a next-next-generation editor that treats code as "data" and the layout (including whitespace) as a "view", making all these concerns largely irrelevant.

But there's a deeper issue here to consider. The only programming project with no disagreement whatsoever on code formatting is the one you work on alone. Wherever there are two programmers working on the same project, there are invariably disagreements about how the code should be formatted. Sometimes serious disagreements. The more programmers you add, the more divisive those disagreements get. And handling those disagreements can be .. tricky. Take this email I received from Philip Leitch:

The place where I work currently has a developer (who is also the head of the development department), who will "clean up" the code of others.

That is -- reformat it, normally without changing what the code does, just changing the variable names, function names, but mainly moving things around to the way they like it.

It is a little perplexing Ö and I'm interested to see what responses people have on this issue.

One of absolute worst, worst methods of teamicide for software developers is to engage in these kinds of passive-aggressive formatting wars. I know because I've been there. They destroy peer relationships, and depending on the type of formatting, can also damage your ability to effectively compare revisions in source control, which is really scary. I can't even imagine how bad it would get if the lead was guilty of this behavior. That's leading by example, all right. Bad example.

The depressing thing about all this is that code formatting matters more than you think. Perhaps even enough to justify the endless religious wars that are fought over it. Consider the 1984 study by Soloway and Ehrlich cited in Code Complete:

Our studies support the claim that knowledge of programming plans and rules of programming discourse can have a significant impact on program comprehension. In their book called The Elements of Programming Style, Kernighan and Plauger also identify what we would call discourse rules. Our empirical results put teeth into these rules: It is not merely a matter of aesthetics that programs should be written in a particular style. Rather there is a psychological basis for writing programs in a conventional manner: programmers have strong expectations that other programmers will follow these discourse rules. If the rules are violated, then the utility afforded by the expectations that programmers have built up over time is effectively nullified. The results from the experiments with novice and advanced student programmers and with professional programmers described in this paper provide clear support for these claims.

There's actual data from honest-to-goodness experiments to support the hypothesis that consistent code formatting is worth fighting for. And there are dozens of studies backing it up, too, as Steve McConnell notes:

In their classic paper Perception in Chess, Chase and Simon reported on a study that compared the abilities of experts and novices to remember the positions of pieces in chess. When pieces were arranged on the board as they might be during a game, the experts' memories were far superior to the novices'. When the pieces were arranged randomly, there was little difference between the memories of the experts and the novices. The traditional interpretation of this result is that an expert's memory is not inherently better than a novice's but that the expert has a knowledge structure that helps him or her remember particular kinds of information. When new information corresponds to the knowledge structure -- in this case, the sensible placement of chess pieces -- the expert can remember it easily. When new information doesn't correspond to a knowledge structure -- the chess pieces are randomly positioned -- the expert can't remember it any better than the novice.

A few years later, Ben Shneiderman duplicated Chase and Simon's results in the computer-programming arena and reported his results in a paper called Exploratory Experiments in Programmer Behavior. Shneiderman found that when program statements were arranged in a sensible order, experts were able to remember them better than novices. When statements were shuffled, the experts' superiority was reduced. Shneiderman's results have been confirmed in other studies. The basic concept has also been confirmed in the games Go and bridge and in electronics, music, and physics.

So yes, absurd as it may sound, fighting over whitespace characters and other seemingly trivial issues of code layout is actually justified. Within reason of course -- when done openly, in a fair and concensus building way, and without stabbing your teammates in the face along the way.

Choose tabs, choose spaces, choose whatever layout conventions make sense to you and your team. It doesn't actually matter which coding styles you pick. What does matter is that you, and everyone else on your team, sticks with those conventions and uses them consistently.

That said, only a moron would use tabs to format their code.

* unless you happen to be programming in whitespace or Python.

Discussion

Training Your Users

When it comes to user interface design, I'm no guru, but I do have one golden rule that I always try to follow:

Make the right thing easy to do and the wrong thing awkward to do.

The things you want users to do should be straightforward and clear -- as simple as falling into the pit of success. Make your software easy to use. Duh. Everyone knows that. The less obvious part of this rule is that sometimes there are things you don't want users to do. In those cases, you actually want your application, or at least certain areas of it, to be harder to use. For example, operations that are risky or dangerous should take more steps.

What you're doing with this design technique is training your users:

The central lesson I learned from exotic animal trainers is that I should reward behavior I like and ignore behavior I don't. After all, you don't get a sea lion to balance a ball on the end of its nose by nagging.

When you make features easy to use, you are rewarding user behavior you like. You are guiding users through your application, giving them a clear and obvious path of least resistance. And when you intentionally choose not to make a feature easy to use, you are effectively ignoring user behavior you don't like. You are indirectly discouraging users from utilizing those features.

poster trained seals

If you aren't taking advantage of both techniques in your user interface -- rewarding with simplicity, and (judiciously) ignoring with complexity where necessary -- you aren't properly training your users.

Discussion