Coding Horror

programming and human factors

When Understanding means Rewriting

If you ask a software developer what they spend their time doing, they'll tell you that they spend most of their time writing code.

However, if you actually observe what software developers spend their time doing, you'll find that they spend most of their time trying to understand code:

where software developers spend their time

Peter Hallam explains:

Why is 5x more time spent modifying code than writing new code? New code becomes old code almost instantly. Write some new code. Go for coffee. All of sudden you've got old code. Brand spanking new code reflects at most only the initial design however most design doesn't happen up front. Most development projects use the iterative development methodology. Design, code, test, repeat. Repeat a lot. Only the coding in the first iteration qualifies as all new code. After the first iteration coding quickly shifts to be more and more modifying rather than new coding. Also, almost all code changes made while bug fixing falls into the modifying code category. Look at [the Visual Studio development team]; our stabilization (aka bug fixing) milestones are as long as our new feature milestones. Modifying code consumes much more of a professional developer's time than writing new code.

Why is 3x more time spend understanding code than modifying code? Before modifying code, you must first understand what it does. This is true of any refactoring of existing code - you must understand the behavior of the code so that you can guarantee that the refactoring didn't change anything unintended. When debugging, much more time is spent understanding the problem than actually fixing it. Once you've fixed the problem, you need to understand the new code to ensure that the fix was valid. Even when writing new code, you never start from scratch. You'll be calling existing code to do most of your work. Either user written code or a library supplied by Microsoft, or a third party for which no source is available. Before calling this existing code you must understand it in precise detail. When writing my first XML enabled app, I spent much more time figuring out the details of the XML class libraries than I did actually writing code. When adding new features you must understand the existing features so that you can reuse where appropriate. Understanding code is by far the activity at which professional developers spend most of their time.

I think the way most developers "understand" code is to rewrite it. Joel thinks rewriting code is always a bad idea. I'm not so sure it's that cut and dried. According to The Universe in a Nutshell, here's what was written on Richard Feynman's blackboard at the time of his death:

What I cannot create, I do not understand.

It's not that developers want to rewrite everything; it's that very few developers are smart enough to understand code without rewriting it. And as much as I believe in the virtue of reading code, I'm convinced that the only way to get better at writing code is to write code. Lots of it. Good, bad, and everything in between. Nobody wants developers to reinvent the wheel (again), but reading about how a wheel works is a poor substitute for the experience of driving around on a few wheels of your own creation.

Understanding someone else's code-- really comprehending how it all fits together-- takes a herculean amount of mental effort. And, even then, is source code truly the best way to understand an application? After reading Nate Comb's thought provoking blog entry, I wonder:

Would Martians wishing to understand the rules of the World of Warcraft (WoW) be better off trying to read its source code or watching video of millions of hours of screen capture?

The challenge of Reginald's interview question is this: "If someone were to read the source code, do you think they could learn how to play [Monopoly]?"

In some ways this challenge hints of the reward of "downhill synthesis" over an "uphill analysis": who really knows what the rules of WoW are except by grace of the analysis of a million fan websites and trial and error. Do the developers really know?

I've worked on plenty of applications where, even with the crutch of source code I wrote myself, I had trouble explaining exactly how the application works. Imagine how difficult that explanation becomes with three, five, or twenty developers involved.

Does the source code really tell the story of the application? I'm not so sure. Maybe the best way to understand an application is, paradoxically, to ignore the source code altogether. If you want to know how the application really works, observe carefully how users use it. Then go write your own version.

Discussion

On Unnecessary Namespacing

Is it really necessary to qualify everything in Windows Vista with the "Windows" namespace?

vista-start-menu-2.png

Hey, guess what operating system this is!

At least the Vista start menu lets me do a containing search, so if I start typing 'fax', the menu dynamically filters itself to show only items containing what I typed. The revamped Start menu is one of my favorite Vista features; it directly addresses XP's abysmal start menu user experience.

But still-- what's with all the Windows noise? Wouldn't that list be so much easier to navigate if we deleted the words "Microsoft" and "Windows" from each entry?

vista-start-menu-3.png

I'm sure the very suggestion of dropping those key branding words will drive the marketing weasels apoplectic. But who's more important? The users, or your marketing weasels?* Repeated words, if they're repeated often enough, are just babbling noise.

I have a similar problem with the add reference dialog in Visual Studio.

visual-studio-add-reference-dialog.png

Unfortunately this dialog does not support containing search-- only "starts with" search-- so it's a royal pain to find what I need. This is a concrete example of how unnecessary namespacing hurts usability. Thank goodness the System namespace is actually named System and not "Microsoft.Windows.dotNet.System".

* a rhetorical question.

Discussion

Is Your IDE Hot or Not?

Scott Hanselman recently brought up the topic of IDE font and color schemes again. I've been in search of the ideal programming font and the ideal syntax colorization scheme for a while now. Here's my current take on it.

Visual Studio 2005 font and color scheme

As you can see, I've finally given in to the inevitability of ClearType. Someone pointed out the zenburn vim color scheme in the comments. I think it's a nice dark background yin to my light background yang. So I set it up as an alternative for the dark background enthusiasts.

Visual Studio 2005 font and color scheme, Zenburn

Try these IDE color schemes yourself. Download the exported Visual Studio 2005 Fonts and Colors settings files:

To import, use the Tools | Import and Export Settings menu in Visual Studio 2005. But be sure you have the necessary fonts installed first – Consolas for the main font and Dina for the output console font.

Here's how to export your own IDE font and color settings:

  • Tools | Import and Export Settings...
  • Select Export
  • Click the All Settings node to unselect everything in the tree
  • Expand the tree to "All Settings, Options , Environment"
  • Click the "Fonts and Colors" node
  • Click next, name the file appropriately, and Finish.

What we really need is for some enterprising coder to create a "Hot or Not" site for IDE color schemes, where we can post screenshots and downloadable *.settings files for our preferred IDE color and font schemes. Update: Someone set up Studio Styles.

If we're posting comparative screenshots, it might be a good idea to use the same code sample in each one. Here's the code sample I used in the above screenshot, which highlights some potential programming-specific font legibility issues (O vs. 0, I vs. l, etcetera).

#region codinghorror.com
class Program : Object
{
  static int _I = 1;
  /// <summary>
  /// The quick brown fox jumps over the lazy dog
  /// THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
  /// </summary>
  static void Main(string[] args)
  {
    Uri Illegal1Uri = new Uri("http://packmyboxwith/jugs.html?q=five-dozen&t=liquor");
    Regex OperatorRegex = new Regex(@"S#$", RegexOptions.IgnorePatternWhitespace);
    for (int O = 0; O < 123456789; O++)
    {
      _I += (O % 3) * ((O / 1) ^ 2) - 5;
      if (!OperatorRegex.IsMatch(Illegal1Uri.ToString()))
      {
        Console.WriteLine(Illegal1Uri);
      }
    }
  }
}
#endregion

If you're formulating your own ideal font and color scheme, the only specific advice I have for you is to avoid too much contrast – don't use pure white on pure black, or vice versa. That's why my background is a light grey and not white.

Discussion

A Visit from the Metrics Maid

For the last few days, I've been surveying a software project. Landing on a planet populated entirely by an alien ecosystem of source code can be overwhelming. That's why the first first thing I do is bust out my software tricorder -- static code analysis tools.

The two most essential static code analysis tools, for .NET projects, are nDepend and FxCop. Like real software tricorders, they produce reams and reams of data -- lots of raw metrics on the source code.

Even basic metrics can identify potential trouble spots and/or areas of interest in the code, such as..

  • Methods that are too large or too small.
  • Classes that are too large or too small.
  • Methods that are too complex (as measured by cyclomatic complexity).
  • Methods with too many parameters (more than 7 plus or minus 2).
  • Methods with too many local variables.
  • Classes with an excessively deep inheritance structure.
  • Types that are excessively large.

These simple metrics are already quite valuable. You can imagine how valuable more advanced software metrics could be, such as code coverage. Or how quickly you're finding and fixing bugs. And more advanced static analysis tools can offer literally hundreds of recommendations, ranging from mundane to mission-critical.

Having more data about your software development project can never be bad. The real trick, of course, lies in interpreting all that data, and deciding how to act on it. There's a huge temptation to become a metermaid-- to use the metrics as a reward or punishment system.

A metermaid

If Joe wrote a method with a cyclomatic complexity of 52, then he better get slapped with a complexity ticket, right? No excess complexity in the simplicity zone, you idiot!

Not necessarily. Responsible use of the metrics is just as important as collecting them in the first place. Gregor Hohpe elaborates:

Some of the most hated people in San Francisco must be the meter maids, the DPT people who drive around in golf carts and hand out tickets to anyone who overslept street cleaning or did not have enough quarters for the meter. On some projects, the most hated people are the metric maids, the people who go around and try to sum up a developer's hard work and intellectual genius in a number between 1 and 10.

Many managers love metrics: "You can't manage it if you can't measure it". I am actually a big proponent of extracting and visualizing information from large code bases or running systems (see Visualizing Dependencies). But when one tries to boil the spectrum between good and evil down to a single number we have to be careful as to what this number actually expresses.

Martin Woodward calls this the measurement dilemma.

The reporting aspects of Team Foundation Server are a new, more accurate instrument to take measurements inside your software development process. But you need to be wary about the things you measure. The metrics need to mean something useful rather than just be interesting. The effect of taking the metric should be carefully considered before taking it. This is not a new problem. But because Team Foundation Server makes it so easy to get data out of the system, the temptations are greater.

Martin also references the Heisenberg Uncertainty Principle, which states that you can't measure something without changing it. I believe this is true for software development metrics only if you are using that metric to reward or punish.

Recording metrics on your project can be beneficial even if you don't explicitly act on them. Having a public "wall of metrics" might be a better idea. It can be a focal point for discussion about what the metrics mean to the team. This gives everyone on the project an opportunity to discuss and reflect, and act on the metrics as they deem appropriate. Maybe the team will even remove a few metrics that are of no value.

What metrics do you find helpful on your software projects? What metrics do you find not so helpful? And if you have no project metrics to talk about, well, what are you waiting for?

Discussion

Vista and the Rise of the Flash Drives

In my recent Windows Vista performance investigation, I discovered the new ReadyBoost feature. ReadyBoost allows you to augment your PC's performance using a USB flash memory drive. It's very easy to use; just plug in a USB flash drive that's 256 megabytes or larger, then navigate to the ReadyBoost tab on the properties dialog for the drive:

vista-readyboost.png

The drive has to meet certain minimum performance characteristics (defined in the ReadyBoost FAQ) to be usable for ReadyBoost. Vista performs a one-time performance benchmark on the drive after it's inserted to determine if the drive is suitable.

But what is ReadyBoost actually doing to improve performance? It's leveraging the unique advantages of flash memory..

  1. decent read and write speeds
  2. extremely fast random access times
  3. very low power consumption

.. by caching the system pagefile on that USB flash drive.* Subsequent accesses hit the cached, compressed pagefile on the flash drive and bypass the hard drive entirely.

If we've gone this far, you might wonder why we just don't go all the way and use a giant 32-gigabyte flash drive as our primary hard drive. I can think of three reasons why you wouldn't want to do that:

  1. Speed. Flash memory is fast, but it's not nearly as fast a modern hard drive. And it's not even remotely in the same league as system memory.
  2. Cost. Although flash memory pricing has been in freefall for a while, it's still rather expensive on a cost-per-megabyte basis. This will definitely change over time, however.
  3. Durability. Flash memory literally wears out after a fixed number of writes, usually 100,000 or so. Hard drives last many orders of magnitude longer.

Also, the performance benefits of a solid state hard drive-- even one based on ultra-fast battery-backed DDR memory-- aren't as amazing as you might think.

That's why the best solution might be a combination of traditional mechanical hard drives and flash memory-- so-called "hybrid" hard drives with embedded flash cache. For example, the Seagate Momentus 5400 PSD includes 256 megabytes of flash RAM. This feature is called ReadyDrive, and it's even better than ReadyBoost. Unlike a USB flash drive, the flash RAM on a hard drive can be read before the system is booted, and thus can be used to speed up boot and resume times, too.

It's looking more and more like flash memory is the future. But be careful, because not all flash memory is created equal. I researched USB flash drive performance recently and I found benchmark roundups at hardware secret, AnandTech, and Ars Technica. In my research, I found that there are at least three distinct tiers of flash drive performance today: mediocre, good, and best. The price difference between the best performers and the worst performers isn't much, so you might as well buy the fast ones. The flash drives that performed the best in the above three benchmarks were the Kingston Data Traveller Elite and the Lexar JumpDrive Lightning.

Cheap flash drives are cheap for a reason-- they skimp on performance. Here's performance comparison of three USB thumb drives I had on hand: a 1 gigabyte Iomega Micro Mini, a 1 gigabyte Kingston Data Traveler Elite, and a generic no-name 128 megabyte model I got at a trade show.

I ran SiSoft Sandra's flash memory test on these three drives. The results are summarized below. Note that the bars are stacked, so the total transfer rate is only as high as the largest sub-color in the bar.

thumb-drive-graph-read-write-2.png

There's a big disparity between read and write performance on flash drives. And small files are disproportionately painful to transfer through these devices. The cheaper the flash drive, the worse these characteristics will be. When you go for an inexpensive USB flash drive, that's the tradeoff you're making.

I also ran the command line chddspeed utility on these three drives. Here are the results for the random access read test.

thumb-drive-graph-random.png

Flash memory is exceptionally strong at random access; my fast WD Raptor drive can't touch these scores.

Here are the chddspeed results for sequential access.

thumb-drive-graph-sequential.png

Up to 12 Mb/sec is nothing to sneeze at, but it's nearly 6 times slower than the 68 Mb/sec the Raptor achieves. If you need fast sequential read (or write) speeds, you want a hard drive.

After all this analysis, it's clear to me that traditional hard drives and flash memory are quite complimentary; they're strong in different areas. But flash drives are the future. They will definitely replace hard drives in almost all low end and low power devices-- and future high performance hard drives will need to have a substantial chunk of flash memory on board to stay competitive.

* Yes, it's encrypted, and yes, it is optimized for the limited duty cycle of flash drives. It's even compressed, so that 1 GB flash drive is effectively 2 GB of cache. This is all covered in the excellent ReadyBoost FAQ.

Discussion