Coding Horror

programming and human factors

For Best Results, Don't Initialize Variables

I noticed on a few projects I'm currently working on that the developers are maniacal about initializing variables. That is, either they initialize them when they're declared:

private string s = null;
private int n = 0;
private DataSet ds = null;

Or they initialize them in the constructor:

class MyClass
{
private string s;
private int n;
private DataSet ds;
public MyClass()
{
s = null;
n = 0;
ds = null;
}
}

Well, this all struck me as unnecessary work in the .NET world. Sure, maybe that's the convention in the wild and wooly world of buffer overrunsC++, but this is managed code. Do we really want to play the I'm smarter than the runtime game again?

Ok, so maybe you're a masochist and you like extra typing. What about the performance argument? According to this well-researched CodeProject article, initializing variables actually hurts performance. The author provides some benchmark test code along with his results:

Creating an object and initializing on definition11% slower
Creating an object and initializing in the constructor16% slower
Calling a method and initializing variables25% slower

That's on the author's Pentium-M 1.6ghz. I tested the same code (optimizations enabled, release mode) on my Athlon 64 2.1ghz and a Prescott P4 2.8ghz:

Athlon 64P4
Creating an object and initializing on definition30% slower35% slower
Creating an object and initializing in the constructor30% slower36% slower
Calling a method and initializing variables14% slower8% slower

I recompiled under VS.NET 2005 beta 2 using the Athlon 64 to see how .NET 2.0 handles this:

Creating an object and initializing on definition0% slower
Creating an object and initializing in the constructor20 % slower
Calling a method and initializing variables20% slower

Clearly there's a substantial performance penalty for initializing variables in both .NET 1.1 and even .NET 2.0 (although the newer compiler appears to optimize away initialization on definition). I recommend avoiding initialization as a general rule, unless you have a compelling reason to do so. If you're only initializing variables to avoid the uninitialized variable compiler warning, check out the new #pragma warning feature to programmatically disable specific warnings in .NET 2.0.

Discussion

Passwords vs. Pass Phrases

Microsoft security guru Robert Hensing hit a home run his first time at bat with his very first blog post. In it, he advocates that passwords, as we traditionally think of them, should not be used:

So here's the deal - I don't want you to use passwords, I want you to use pass-PHRASES. What is a pass-phrase you ask? Let's take a look at some of my recent pass-phrases that I've used inside Microsoft for my 'password'.
  • "If we weren't all crazy we would go insane" (Jimmy Buffett rules)*
  • "Send the pain below!"
  • "Mean people suck!"

So why are these pass-phrases so great?

  1. They meet all password complexity requirements due to the use of upper / lowercase letters and punctuation (you don't HAVE to use numbers to meet password complexity requirements)
  2. They are so freaking easy for me to remember it's not even funny. For me, I find it MUCH easier to remember a sentence from a favorite song or a funny quote than to remember 'xYaQxrz!' (which b.t.w. is long enough and complex enough to meet our internal complexity requirements, but is weak enough to not survive any kind of brute-force password grinding attack with say LC5, let alone a lookup table attack). That password would not survive sustained attack with LC5 long enough to matter so in my mind it's pointless to use a password like that. You may as well just leave your password blank.
  3. I dare say that even with the most advanced hardware you are not going to guesss, crack, brute-force or pre-compute these passwords in the 70 days or so that they were around (remember you only need the password to survive attack long enough for you to change the password).

Windows 2k and higher support passwords of up to 127 unicode characters. So this will work on virtually every Windows network in existence. Reggie Burnett, however, has some doubts:

The reason I think that Robert's logic is a bit flawed is that a pass phrase is likely to contain readable words (else it really isn't a pass phrase) and therefore can be attacked not at the letter level but at the word level. According to various sites I visited, the average English speaker knows about 20,000 words but uses only about 2,000 of those in a given week. Since the user is likely to use words they are used to, we can safely say that most pass phrases will contain one of about 5,000 words. And, if a pass phrase contains 4 words, then our possibilities are 5000^4. I'll spare you the math, but you'll see that the cracker that is trying pass phrases has alot fewer possibilities to try. Now, of course, using more words will increase the security, but we should also note that since the attack is at the word level, the length of the word would not matter. "Mean people suck" would be just as secure as "Extremely important password". They are both 3 words and both use common words.

While I see his point, he's completely ignoring the capitalization and punctuation in "Mean people suck!". I do agree that for the best security, your passphrase should include capitalization, punctuation, and possibly even numbers if you can work them in there in a logical way. Andy Johns elaborates:

As I've often mentioned, I'm a consultant and I see a lot of crap out in the wild. By far the most annoying crap I see is around passwords. The more paranoid the network admins (or security council, or board, or whoever sets the rules) the more obscure the passwords must be, and the more often they need to be changed. What these people fail to realize is the average human worker just wants to do their job, and can't remember Syz8#K3! as a password. So what do they do.... Out comes the post-it-note on the desk, or in the drawer, or under the keyboard, or the file on the desktop called "passwords.txt". Some workers try and be smart by leaving out a letter, or writing it backwards.... but still, if your password is so hard to remember that you have to write it down, then you have no security at all, and a significant portion of your support staff/costs must be spent dealing with resetting passwords.

A pass-phrase of "this is my password and it's for my eyes only" is far easier to remember than Syz8#K3! and also far more secure, and nearly takes the same amount of time to type. Need more security, throw in a few caps, or numbers: "My address is 1234 Main street" or "Jenny's number is 867-5309". Yes, I'm breaking rules about not including personal information in a password, but remember, 1) these are examples, and 2) a pass-phrase is different. A password of "Chris" because your son's name is Chris is a bad password, but a password of: "My oldest son's name is Chris and he is 10 years old" is a good password.

Passphrases are clearly more usable than traditional "secure" passwords. They are also highly likely to be more secure. Even naive worst-case passphrases like "this is my password" aren't all that hackable, at least when compared to their single word equivalents, eg, "password".

Easier on the user, harder for hackers: that's a total no-brainer. I've adopted passphrases across the board on all the systems I use.

* ugh

Discussion

A Tribute to the Windows 3.1 "Hot Dog Stand" Color Scheme

Yesterday's post about code syntax color schemes got me thinking about what is perhaps the ultimate color scheme, Windows 3.1's "Hot Dog Stand":

Windows 3.1 'Hot Dog Stand' color scheme

The truly funny thing about this color scheme is that all the other Windows 3.1 color schemes are surprisingly rational, totally reasonable color schemes. And then you get to "Hot Dog Stand". Which is utterly insane. And that makes it quite possibly the greatest color scheme ever devised.

I have to think it was included as a joke. And it is referenced in a joke titled You may be a Microsoft Employee If…:

…your house is decorated like the "Hot Dog Stand" color scheme from Windows.

Discussion

Code Colorizing and Readability

Most developers, myself included, are content with syntax coloring schemes that are fairly close to Visual Studio's default of black text on a white background. I'll occasionally encounter developers who prefer black backgrounds. And I've even seen developers who prefer the white on blue scheme popularized by DOS Wordperfect.

I vaguely recall reading somewhere that black on white was the most readable of all color schemes. I found two studies with actual data based on real-world tests with users:

  1. Color Test Results
    As you can see, the most readable color combination is black text on white background; overall, there is a stronger preference for any combination containing black. The two least readable combinations were red on green and fuchsia on blue. White on blue and red on yellow were ranked fairly high, while green on yellow and white on fuchsia were ranked fairly low. All others fell somewhere between these extremes. Also, in every color combination surveyed, the darker text on a lighter background was rated more readable than its inverse (e.g. blue text on white background ranked higher then white text on blue background).
  2. Readability Of Websites With Various Foreground/Background Color Combinations, Font Types And Word Styles
    From these results, one can say that contrast affects legibility, but unfortunately, it does not seem to be as simple as high contrast being better than low contrast. In the main experiment, Green on Yellow had the fastest RT's, and in the control experiment, medium gray, and dark gray had the fastest RT's. In neither experiment did the Black on White condition show the fastest RT's. These results show that these participants had faster response times when more median contrasts were used. These results supported Powell (1990), who suggested avoiding sharp contrasts, but did not fully support Rivlen et al (1990), who suggested maintaining high contrast.

    According to a manual by AT&T; (1989), the direction of the contrast (dark on light, or light on dark) might also affect legibility. When light text is placed on a dark background the text may seem to glow and become blurred; this is referred to as halation, and it may make the text harder to read. Some evidence for an effect of halation was found in the current experiment.

So, yes, there's definitely data to support the black on white status quo. After a quick trip into the Environment, Fonts and Colors section of the Visual Studio Options dialog, I captured these screenshots. Compare for yourself:

Visual Studio .NET, standard white background color scheme

Visual Studio .NET, black background color scheme

Visual Studio .NET, blue background color scheme

I'll take any of these schemes over the non-colorized Notepad version, but I feel very strongly that black on white color schemes are the way to go for overall readability.

Interestingly, this is also true of bibles.

Discussion

Gigabit Ethernet and Back of the Envelope Calculations

At work today, we had a problem with a particular workstation. Although it was connected to a gigabit ethernet hub, network file transfers were "too slow". How do you quantify "too slow"?

a gigabit ethernet LAN connection

I was reminded of chapter seven of Programming Pearls -- The Back of the Envelope:

It was in the middle of a fascinating conversation on software engineering that Bob Martin asked me, "How much water flows out of the Mississippi River in a day?" Because I had found his comments up to that point deeply insightful, I politely stifled my true response and said, "Pardon me?" When he asked again I realized that I had no choice but to humor the poor fellow, who had obviously cracked under the pressures of running a large software shop.

My response went something like this. I figured that near its mouth the river was about a mile wide and maybe twenty feet deep (or about one two-hundred-and-fiftieth of a mile). I guessed that the rate of flow was five miles an hour, or a hundred and twenty miles per day. Multiplying

1 mile x 1/250 mile x 120 miles/day ~ 1/2 mile3/day

showed that the river discharged about half a cubic mile of water per day, to within an order of magnitude. But so what?

At that point Martin picked up from his desk a proposal for the communication system that his organization was building for the Summer Olympic games, and went through a similar sequence of calculations. He estimated one key parameter as we spoke by measuring the time required to send himself a one-character piece of mail. The rest of his numbers were straight from the proposal and therefore quite precise. His calculations were just as simple as those about the Mississippi River and much more revealing. They showed that, under generous assumptions, the proposed system could work only if there were at least a hundred and twenty seconds in each minute. He had sent the design back to the drawing board the previous day. (The conversation took place about a year before the event, and the final system was used during the Olympics without a hitch.)

That was Bob Martin's wonderful (if eccentric) way of introducing the engineering technique of "back-of-the-envelope" calculations. The idea is standard fare in engineering schools and is bread and butter for most practicing engineers. Unfortunately, it is too often neglected in computing.

To diagnose the network throughput issues, I busted out my copy of pcattcp and started doing some baseline network speed measurements. It's a great utility, and quite simple to use; just run one instance on a remote machine using the -R flag, then run another instance on the client with -t (remotename) and you're off to the races.

But even before that, I started with a loopback test:

C:Program Filesttcp>pcattcp -t -f M localhost
PCAUSA Test TCP Utility V2.01.01.08
TCP Transmit Test
Transmit    : TCP -> 127.0.0.1:5001
Buffer Size : 8192; Alignment: 16384/0
TCP_NODELAY : DISABLED (0)
Connect     : Connected to 127.0.0.1:5001
Send Mode   : Send Pattern; Number of Buffers: 2048
Statistics  : TCP -> 127.0.0.1:5001
16777216 bytes in 0.17 real seconds = 93.02 MB/sec +++
numCalls: 2048; msec/call: 0.09; calls/sec: 11906.98

This is helpful because it establishes an absolute upper bound on network performance. Even with an infinitely fast network, I won't achieve more than 93 megabytes per second throughput-- at least not on my PC. And this is a completely in-memory test; real world network operations may depend on hard disk reads and writes, which will be far slower.

A good rule of thumb for real-world throughput is:

  • 10baseT = 1 megabyte/sec
  • 100baseT = 10 megabytes/sec
  • 1000baseT = 30 megabytes/sec

All my ttcp testing over the last couple years has confirmed these numbers, plus or minus ten percent. I don't have as much experience with gigabit throughput, since I just got my first gigabit router, but you definitely shouldn't expect the perfect scaling we achieved moving from 10baseT to 100baseT. Without any major tweaking, you'll get only a fraction of the tenfold bandwidth improvement you might expect:

I noticed a significant improvement in multicast performance, measured by the time required to send a 690MB disk image to 18 multicast clients in one session. The HP NetServer LT6000r served as the multicast server, and the clients were using Fast Ethernet links to the desktop switch. On the Fast Ethernet network, the task took 19 minutes. On the Gigabit Ethernet network, the time was reduced to 9 minutes.

I measured the transfer of a large (1GB) file between the same hosts over Fast Ethernet and Gigabit Ethernet links with sustained network traffic (streaming media to multiple unicast clients). The file transfer took 230 seconds on Fast Ethernet and 88 seconds on Gigabit Ethernet.

Overall, my tests showed that Gigabit Ethernet provided a tangible performance improvement, but bottlenecks elsewhere kept the overall throughput lower than I had hoped. I was satisfied with Gigabit Ethernet performance relative to Fast Ethernet, and I was particularly impressed that general network responsiveness remained acceptable even during peak network loads. But I was disappointed not to be able to reach much beyond 450Mbps on the Lab's most capable server.

To be fair, that article is from 2002. A typical new desktop PC probably has more bandwidth and power than the author's fastest server. Even with those real world caveats, gigabit ethernet still offers 2 to 3 times the performance of 100baseT, which isn't exactly chopped liver, either.

In the end, our issue at work had nothing to do with the "problem" desktop. After a bit of ad-hoc ttcp testing, we found that nobody could achieve more than about 11 megabytes per second throughput to the server, even when directly connected to the gigabit switch. Download pcattcp and try for yourself. Some other interesting experiments you can run with ttcp are UDP (-u) versus TCP/IP, and varying the packet size (-l 4096).

Discussion