Coding Horror

programming and human factors

Paying Down Your Technical Debt

Every software project I've ever worked on has accrued technical debt over time:

Technical Debt is a wonderful metaphor developed by Ward Cunningham to help us think about this problem. In this metaphor, doing things the quick and dirty way sets us up with a technical debt, which is similar to a financial debt. Like a financial debt, the technical debt incurs interest payments, which come in the form of the extra effort that we have to do in future development because of the quick and dirty design choice. We can choose to continue paying the interest, or we can pay down the principal by refactoring the quick and dirty design into the better design. Although it costs to pay down the principal, we gain by reduced interest payments in the future.

The metaphor also explains why it may be sensible to do the quick and dirty approach. Just as a business incurs some debt to take advantage of a market opportunity developers may incur technical debt to hit an important deadline. The all too common problem is that development organizations let their debt get out of control and spend most of their future development effort paying crippling interest payments.

No matter how talented and smart the software developers, all these tiny deferments begin to add up and cumulatively weigh on the project, dragging it down. My latest project is no different. After six solid months working on the Stack Overflow codebase, this is exactly where we are. We're digging in our heels and retrenching for a major refactoring of our database. We have to stop working on new features for a while and pay down some of our technical debt.

credit cards

I believe that accruing technical debt is unavoidable on any real software project. Sure, you refactor as you go, and incorporate improvements when you can – but it's impossible to predict exactly how those key decisions you made early on in the project are going to play out. All you can do is roll with the punches, and budget some time into the schedule to periodically pay down your technical debt.

The time you take out of the schedule to make technical debt payments typically doesn't result in anything the customers or users will see. This can sometimes be hard to justify. In fact, I had to defend our decision with Joel, my business partner. He'd prefer we work on some crazy thing he calls revenue generation, whatever that is.

Steve McConnell has a lengthy blog entry examining technical debt. The perils of not ackowledging your debt are clear:

One of the important implications of technical debt is that it must be serviced, i.e., once you incur a debt there will be interest charges. If the debt grows large enough, eventually the company will spend more on servicing its debt than it invests in increasing the value of its other assets. A common example is a legacy code base in which so much work goes into keeping a production system running (i.e., "servicing the debt") that there is little time left over to add new capabilities to the system. With financial debt, analysts talk about the "debt ratio," which is equal to total debt divided by total assets. Higher debt ratios are seen as more risky, which seems true for technical debt, too.

Beyond what Steve describes here, I'd also argue that accumulated technical debt becomes a major disincentive to work on a project. It's a collection of small but annoying things that you have to deal with every time you sit down to write code. But it's exactly these small annoyances, this sand grinding away in the gears of your workday, that eventually causes you to stop enjoying the project. These small things matter.

It can be scary to go in and rebuild a lot of working code that has become crufty over time. But don't succumb to fear.

I must not fear.
Fear is the mind-killer.
Fear is the little-death that brings total obliteration.
I will face my fear.
I will permit it to pass over me and through me.
And when it has gone past I will turn the inner eye to see its path.
Where the fear has gone there will be nothing.
Only I will remain.

When it comes time to pay down your technical debt, don't be afraid to break stuff. It's liberating, even energizing to tear down code in order to build it up stronger and better than it was before. Be brave, and realize that paying your technical debt every so often is a normal, necessary part of the software development cycle to avert massive interest payments later. After all, who wants to live forever?

Discussion

Who's Your Coding Buddy?

I am continually amazed how much better my code becomes after I've had a peer look at it. I don't mean a formal review in a meeting room, or making my code open to anonymous public scrutiny on the internet, or some kind of onerous pair programming regime. Just one brief attempt at explaining and showing my code to a fellow programmer – that's usually all it takes.

This is, of course, nothing new. Karl Wiegers' excellent book Peer Reviews in Software: A Practical Guide has been the definitive guide since 2002.

Peer Reviews in Software: a Practical Guide

I don't think anyone disputes the value of having another pair of eyes on your code, but there's a sort of institutional inertia that prevents it from happening in a lot of shops. In the chapter titled A Little Help from Your Friends, Karl explains:

Busy practitioners are sometimes reluctant to spend time examining a colleague's work. You might be leery of a coworker who asks you to review his code. Does he lack confidence? Does he want you to do his thinking for him? "Anyone who needs his code reviewed shouldn't be getting paid as a software developer," scoff some review resisters.

In a healthy software engineering culture, team members engage their peers to improve the quality of their work and increase their productivity. They understand that the time they spend looking at a colleague's work product is repaid when other team members examine their own deliverables. The best software engineers I have known actively sought out reviewers. Indeed, the input from many reviewers over their careers was part of what made these developers the best.

In addition to the above chapter, you can sample Chapter 3 (pdf) courtesy of the author's own Process Impact website. This isn't just feel-good hand waving. There's actual data behind it. Multiple studies show code inspections are startlingly effective.

the average defect detection rate is only 25 percent for unit testing, 35 percent for function testing, and 45 percent for integration testing. In contrast, the average effectiveness of design and code inspections are 55 and 60 percent.

So why aren't you doing code reviews? Maybe it's because you haven't picked out a coding buddy yet!

Remember those school trips, where everyone was admonished to pick a buddy and stick with them? This was as much to keep everyone out of trouble as safe. Well, the same rule applies when you're building software. Before you check code in, give it a quick once-over with your buddy. Can you explain it? Does it make sense? Is there anything you forgot?

I am now required by law to link to this cartoon.

the only valid measurement of code quality: WTFs per minute

Thank you, I'll be here all week.

But seriously, this cartoon illustrates exactly the kind of broad reality check we're looking for. It doesn't have to be complicated to be effective. WTFs/minute is a perfectly acceptable unit of measurement to use with your coding buddy. The XP community has promoted pair programming for years, but I think the buddy system is a far more practical way to achieve the same results.

Besides, who wouldn't want to be half of an awesome part-time coding dynamic duo?
Batman and Robin

That's way more exciting than the prospect of being shackled to the same computer with another person. Think about all the other classic dynamic duos out there:

Individuals can do great things, but two highly motivated peers can accomplish even more when they work together. Surely there's at least one programmer you work with who you admire or at least respect enough to adopt the buddy system with. (And if not, you might consider changing your company.)

One of the great joys of programming is not having to do it alone. So who's your coding buddy?

Discussion

Rate Limiting and Velocity Checking

Lately, I've been seeing these odd little signs pop up in storefronts around town.

7-11 rate limiter

All the signs have various forms of this printed on them:

Only 3 students at a time in the store please

We took that picture at a 7-11 convenience store which happens to be near a high school, so maybe the problem is particularly acute there. But even farther into town, the same signs appear with disturbing regularity. I'm guessing the store owners must consider these rules necessary because:

  • teenage students are more likely to shoplift than most customers
  • with many teenage students in the store, it's difficult for the owners to keep an eye on everyone, which further increases the likelihood of shoplifting.

I'm just guessing; I don't own a store. But like the "no elephants" sign, it must be there to address a real problem.

When you go into a restaurant and see a sign that says "No Dogs Allowed," you might think that sign is purely proscriptive: Mr. Restaurant doesn't like dogs around, so when he built the restaurant he put up that sign. If that was all that was going on, there would also be a "No Snakes" sign; after all, nobody likes snakes. And a "No Elephants" sign, because they break the chairs when they sit down. The real reason that sign is there is historical: it is a historical marker that indicates that people used to try to bring their dogs into the restaurant

All these signs are enough to make me question the ethics of high school students in groups of 3 or more. Although, to be fair, I've seen some really shifty looking graduate students in my day.

In truth, these kinds of limits are everywhere; they're just not as obvious because there's often no signage trail to follow.

  • Most ATMs only allow you to withdraw $300 cash maximum in one day.
  • Free email accounts typically limit how many emails can be sent per day.
  • Internet providers limit individual download and upload speeds to ensure they aren't overselling their bandwidth.
  • There's a maximum on how many Xbox Live Points you can add to your account per day. (All 500+ Rock Band songs aren't going to download themselves, after all.)

I'm sure you can think of lots of other real world examples. They're all around you.

There are people who act like groups of rampaging teenage students online, too, and we deal with them in the same way: by imposing rate limits! Consider how Google limits any IP address that's submitting "too many" search requests:

Several things can trigger the sorry message.

google error: we're sorry, search rate limiter with captcha

Often it's due to infected computers or DSL routers that proxy search traffic through your network - this may be at home or even at a workplace where one or more computers might be infected. Overly aggressive SEO ranking tools may trigger this message, too. In other cases, we have seen self-propagating worms that use Google search to identify vulnerable web servers on the Internet and then exploit them. The exploited systems in turn then search Google for more vulnerable web servers and so on. This can lead to a noticeable increase in search queries and sorry is one of our mechanisms to deal with this.

I did a bit of Google scraping once for a small research project, but I never ran into the CAPTCHA limiter. I think that entry predates its appearance. But it does make you wonder what typical search volumes are, and how they're calculated. Determining how much is "too much" -- that's the art of rate limiting. It's a tricky thing, even for the store owner:

  • Couldn't three morally bankrupt students shoplift just as effectively as four?
  • How do you tell who is a student? Is it based purely on perception of age?
  • Do we expect this rule to be self-enforcing? Will the fourth student walk into the store, identify three other students, and then decide to leave?

Rate limiting isn't always a precise science. But it's necessary, even with the false positives -- consider how dangerous a login entry with no limits on failed attempts could be. This is especially true once your code is connected to the internet. Human students can be a problem, but there's a practical limit to how many students can fit in a store, and how fast they can physically shoplift your inventory. But what if those "students" were an infinite number of computer programs, capable of stealing items from your web store at a rate only limited by network bandwidth? Your store would be picked clean in a matter of minutes. Maybe even seconds!

Not having any sort of rate limiting in your web application is an open invitation to abuse. Even the most innocuous of user actions, if done rapidly enough and by enough users, could have potentially disastrous effects.

Even after you've instituted a rate limit, you can still get in trouble. On Stack Overflow, we designed for evil. We have a Google-style rate limiting CAPTCHA in place, along with a variety of other bot defeating techniques. They'be been working well so far. But what we failed to consider was that a determined (and apparently ultra-bored) human user could sit there and solve CAPTCHAs as fast as possible to spam the site.

And thus was born a new user based limit. I suppose we could create a little sign and hang it outside our virtual storefront:

Only 1 question per new user every 10 minutes, please.

There are a few classes of rate limiting or velocity checking you can do:

  1. Per user or API key. Ensure that any given user account or API account key holder can only perform (n) actions per minute. This is usally fairly safe, though it won't protect you from a user who automates the creation of 100 puppet accounts to do their bidding. It all depends how strictly you tie identity to the API key or user; you can easily ban, or in the worst case, track down the culprits and ask them to desist.

  2. Per IP address. Ensure that any given IP address can only perform (n) actions per minute. This works well in the typical case, but can cause problems for multiple users who happen to be behind a proxy that makes them appear to you as the "same" IP address. This is the only method possible on mostly anonymous sites like Craigslist, and it definitely works, because I've been on the receiving end of it. Example implementations are mod_evasive for Apache, or the IIS7 Dynamic IP Restriction module.

  3. Per global action. Ensure that a particular action can only happen (n) times per minute. Kind of the nuclear option, so obviously must be used with care. Can make sense for the "big red launch button" administrator functions which should be extraordinarily rare -- until a malicious user happens to gain administrator rights and starts pushing that big red button over and over.

I was shocked how little comprehensive information was out there on rate limiting and velocity checking for software developers, because they are your first and most important line of defense against a broad spectrum of possible attacks. It's amazing how many attacks you can mitigate or even defeat by instituting basic rate limiting.

Take a long, hard look your own website -- how would it deal with a roving band of bored, morally ambiguous schoolkids?

Discussion

The Bad Apple: Group Poison

A recent episode of This American Life interviewed Will Felps, a professor who conducted a sociological experiment demonstrating the surprisingly powerful effect of bad apples.

Groups of four college students were organized into teams and given a task to complete some basic management decisions in 45 minutes. To motivate the teams, they're told that whichever team performs best will be awarded $100 per person. What they don't know, however, is that in some of the groups, the fourth member of their team isn't a student. He's an actor hired to play a bad apple, one of these personality types:

  • The Depressive Pessimist will complain that the task that they're doing isn't enjoyable, and make statements doubting the group's ability to succeed.
  • The Jerk will say that other people's ideas are not adequate, but will offer no alternatives himself. He'll say "you guys need to listen to the expert: me."
  • The Slacker will say "whatever", and "I really don't care."

The conventional wisdom in the research on this sort of thing is that none of this should have had much effect on the group at all. Groups are powerful. Group dynamics are powerful. And so groups dominate individuals, not the other way around. There's tons of research, going back decades, demonstrating that people conform to group values and norms.

But Will found the opposite.

Invariably, groups that had the bad apple would perform worse. And this despite the fact that were people in some groups that were very talented, very smart, very likeable. Felps found that the bad apple's behavior had a profound effect – groups with bad apples performed 30 to 40 percent worse than other groups. On teams with the bad apple, people would argue and fight, they didn't share relevant information, they communicated less.

Even worse, other team members began to take on the bad apple's characteristics. When the bad apple was a jerk, other team members would begin acting like a jerk. When he was a slacker, they began to slack, too. And they wouldn't act this way just in response to the bad apple. They'd act this way to each other, in sort of a spillover effect.

What they found, in short, is that the worst team member is the best predictor of how any team performs. It doesn't seem to matter how great the best member is, or what the average member of the group is like. It all comes down to what your worst team member is like. The teams with the worst person performed the poorest.

The actual text of the study (pdf) is available if you're interested. However, I highly recommend listening to the first 11 minutes of the This American Life show. It's a fascinating, highly compelling recap of the study results. I've summarized, but I can't really do it justice without transcribing it all here.

Ira Glass, the host of This American Life, found Felps' results so striking that he began to question his own teamwork:

I've really been struck at how common bad apples are. Truthfully, I've been kind of haunted by my conversation with Will Felps. Hearing about his research, you realize just how easy it is to poison any group [...] each of us have had moments this week where we wonder if we, unwittingly, have become the bad apples in our group.

As always, self-awareness is the first step. If you can't tell who the bad apple is in your group, it might be you. Consider your own behavior on your own team – are you slipping into any of these negative bad apple behavior patterns, even in a small way?

But there was a solitary glimmer of hope in the study, one particular group that bucked the trend:

There was one group that performed really well, despite the bad apple. There was just one guy, who was a particularly good leader. And what he would do is ask questions, he would engage all the team members, and diffuse conflicts. I found out later that he's actually the son of a diplomat. His father is a diplomat from some South American country. He had this amazing diplomatic ability to diffuse the conflict that normally would emerge when our actor, Nick, would display all this jerk behavior.

This apparently led Will to his next research project: can a group leader change the dynamics and performance of a group by going around and asking questions, soliciting everyone's opinions, and making sure everyone is heard?

While it's depressing to learn that a group can be so powerfully affected by the worst tendencies of a single member, it's heartening to know that a skilled leader, if you're lucky enough to have one, can intervene and potentially control the situation.

Still, the obvious solution is to address the problem at its source: get rid of the bad apple.

Even if it's you.

Discussion

Are You An Expert?

I think I have a problem with authority. Starting with my own.

It troubles me greatly to hear that people see me as an expert or an authority, and not a fellow amateur.

If I've learned anything in my career, it is that approaching software development as an expert, as someone who has already discovered everything there is to know about a given topic, is the one surest way to fail.

Experts are, if anything, more suspect than the amateurs, because they're less honest. You should question everything I write here, in the same way you question everything you've ever read online – or anywhere else for that matter. Your own research and data should trump any claims you read from anyone, no matter how much of an authority or expert you, I, Google, or the general community at large may believe them to be.

Have you ever worked with software developers who thought of themselves as experts, with almost universally painful results? I certainly have. You might say I've developed an anti-expert bias. Apparently, so has Wikipedia; a section titled warnings to expert editors explains:

  1. Experts can identify themselves on their user page and list whatever credentials and experience they wish to publicly divulge. It is difficult to maintain a claim of expertise while being anonymous. In practice, there is no advantage (and considerable disadvantage) in divulging one's expertise in this way.

  2. Experts do not have any other privileges in resolving edit conflicts in their favor: in a content dispute between a (supposed) expert and a non-expert, it is not permissible for the expert to "pull rank" and declare victory. In short, "Because I say so" is never an acceptable justification for a claim in Wikipedia, regardless of expertise. Likewise, expert contributions are not protected from subsequent revisions from non-experts, nor is there any mechanism to do so. Ideally, if not always in practice, it is the quality of the edits that counts.

  3. There is a strong undercurrent of anti-expert bias in Wikipedia. Thus, if you become recognized as an expert you will be held to higher standards of conduct than non-experts.

Let's stop for a moment to savor the paradox of a free and open encyclopedia written by people who view the contributions of experts with healthy skepticism. How could that possibly work?

I'd argue that's the only way it could work – when all contributions are viewed critically, regardless of source. This is a radical inversion of power. But a radical inversion of power is exactly what is required. There are only a handful of experts, but untold million amateurs. And the contributions of these amateurs is absolutely essential when you're trying to generate a website that contains a page for.. well, everything. The world is a fractal place, filled with infinite detail. Nobody knows this better than software developers. The programmers in the trenches, spending every day struggling with the details, are the people who often have the most local knowledge about narrow programming topics. There just aren't enough experts to go around.

So what does it mean to be an expert, then, when expertise is perceived as impractical at best, and a liability at worst? In a recent Google talk, James Bach presented the quintessential postmodern image of an expert performing – Steve McQueen in The Towering Inferno:

[turns to fire commissioner] What do we got here, Kappy?
Fire started, 81st floor, storage room. It's bad. Smoke's so thick, we can't tell how far it's spread.
Exhaust system?
Should've reversed automatically. It must be a motor burnout.
Sprinklers?
They're not working on 81.
Why not?
I don't know.

steve-mcqueen-towering-inferno.jpg

[turns to architect] Jim? Give us a quick refresher on your standpipe system.
Floors have 3 and 1.5 inch outlets.
GPM?
Fifteen hundred from ground to 68, and 1,000 from 68 to 100, and 500 from there to the roof.
Are these elevators programmed for emergencies?
Yes.
What floor are your plans on?
79. My office.
That's two floors below the fire. It'll be our Forward Command. Men, take up the equipment. I want to see all floor plans, 81 through 85.
Gotcha.
[turns to security chief] Give me a list of your tenants.
Don't worry, we're moving them out now.
Not live-ins. Businesses.
We lucked out. Most of them haven't moved in yet. Those that have are off at night.
I want to know who they are, not where.
What's that got to do with anything? Who they are?
Any wool or silk manufacturers? In a fire, wool and silk give off cyanide gas. Any sporting good manufacturers, like table-tennis balls? They give off toxic gases. Now do you want me to keep going?
One tenant list, coming up.
[turns to crew leader] What do we got?
Elevator bank, central core. Service elevators here. Air conditioning ducts, 6 inches.
Pipe alleys here?
One, two, three, four, five.
Have you got any construction on 81? Anything that can blow up, like gasoline, fabric cleaner?
I don't think so.

What does this tell us? I mean, other than … Steve McQueen is a badass? Being an expert isn't telling other people what you know. It's understanding what questions to ask, and flexibly applying your knowledge to the specific situation at hand. Being an expert means providing sensible, highly contextual direction.

What I love about James Bach's presentation is how he spends the entire first half of it questioning and deconstructing everything – his field, his expertise, his own reputation and credentials, even! And then, only then, he cautiously, slowly builds it back up through a process of continual learning.

Level 0: I overcame obliviousness
I now realize there is something here to learn.

Level 1: I overcame intimidation
I feel I can learn this subject or skill. I know enough about it so that I am not intimidated by people who know more than me.

Level 2: I overcame incoherence
I no longer feel that I'm pretending or hand-waving. I feel reasonably competent to discuss or practice. What I say sounds like what I think I know.

Level 3: I overcame competence.
Now I feel productively self-critical, rather than complacently good enough. I want to take risks, invent, teach, and push myself. I want to be with other enthusiastic students.

Insight like this is why Mr. Bach is my favorite Buccaneer-Scholar. He leaves us with this bit of advice to New Experts:

  • Practice, practice, practice!
  • Don't confuse experience with expertise.
  • Don't trust folklore – but learn it anyway.
  • Take nothing on faith. Own your methodology.
  • Drive your own education – no one else will.
  • Reputation = Money. Build and protect your reputation.
  • Relentlessly gather resources, materials, and tools.
  • Establish your standards and ethics.
  • Avoid certifications that trivialize the craft.
  • Associate with demanding colleagues.
  • Write, speak, and always tell the truth as you see it.

Of course, Mr. Bach is talking about testing here, but I believe his advice applies equally well to developing expertise in programming, or anything else you might do in a professional capacity. It starts with questioning everything, most of all yourself.

So if you want to be an expert in practice rather than in name only, take a page from Steve McQueen's book. Don't be the guy telling everyone what to do. Be the guy asking all the questions.

Discussion