Coding Horror

programming and human factors

I Heart Strings

Brad Abrams was a founding member of the .NET common language runtime team way back in 1998. He's also the co-author of many essential books on .NET, including both volumes of the .NET Framework Standard Library Annotated Reference. I was at a presentation Brad gave to the Triangle .NET User's Group early in 2005. During the Q&A period, an audience member (and a friend of mine) asked Brad this question:

What's your favorite class in the .NET 1.1 common langauge runtime?

His answer? String.

And that's from a guy who will forget more about the .NET runtime than I will ever know about it. I still have my .NET class library reference poster, autographed by Brad right next to the String class.

I've always felt that string is the most noble of datatypes. Computers run on ones and zeros, sure, but people don't. They use words, sentences, and paragraphs to communicate. People communicate with strings. The meteoric rise of HTTP, HTML, REST, serialization, and other heavily string-oriented, human-readable techniques vindicates -- at least in my mind -- my lifelong preference for the humble string.

Or, you could argue that we now have so much computing power and bandwidth available that passing friendly strings around in lieu of opaque binary data is actually practical. But don't be a killjoy.

Guess what my favorite new .NET 2.0 feature is. Go ahead. Guess! Generics? Nope. Partial classes? Nope again. It's the String.Contains method. And I'm awfully fond of String.IsNullOrEmpty, too.

What I love most about strings is that they have a million and one uses. They're the swiss army knife of data types. Regular expressions, for example, are themselves strings, as is SQL.*

Regex.IsMatch(s, "<[a-z]|<!|&#|Won[a-z]*s*=|(scripts*:)|expression(")

One of the classic uses for strings, going way back to the C days, is to specify output formats. Here's an example of basic string formatting in .NET.

"Date is " + DateTime.Now.ToString("MM/dd hh:mm:ss");

You can explicitly use the String.Format method to format, well, almost anything, including our date:

String.Format("Date is {0:MM/dd hh:mm:ss}", DateTime.Now);

As Karl Seguin points out, String.Format is a superior alternative to naive string concatenation:

Surely, I can't be the only one that has a hard time writing and maintaining code like:

d.SelectSingleNode("/graph/data[name='" + name + "']");

When I do write code like the above, I almost always forget my closing quote or square bracket! And as things get more complicated, it becomes a flat out nightmare.

The solution is to make heavy use of string.Format. You'll never EVER see me use plus to concatenate something to a string, and there's no reason you should either. To write the above code better, try:

d.SelectSingleNode(string.Format("/graph/data[name='{0}']", name));

It's a win-win scenario: you get more power and more protection. For a complete rundown of the zillion possible String.Format specifiers, try these links:

String class, you complete me.

Discussion

The Visual Studio IDE and Regular Expressions

The Visual Studio IDE supports searching and replacing with regular expressions, right? Sure it does. It's right there in grey and black in the find and replace dialog. Just tick the "use Regular expressions" checkbox and we're off to the races.

The Visual Studio 2005 find dialog

However, you're in for an unpleasant surprise when you attempt to actually use regular expressions to find anything in Visual Studio. Apparently the Visual Studio IDE has its own bastardized regular expression syntax. Why? Who knows. Probably for arcane backwards compatibility reasons, although I have no idea why you'd want to perpetually carry forward insanity. Evidently it makes people billionaires, so who am I to judge.

God forbid we all learn one standard* regular expression dialect.

At any rate, some of the Visual Studio IDE regular expressions look awfully similar to standard regex:

Visual Studio IDE Standard
Any single character . .
Zero or more * *
One or more + +
Beginning of line ^ ^
End of line $ $
Beginning of word < (no equivalent)
End of word > (no equivalent)
Line break n n
Any character in set [ ] [ ]
Any character not in set [^ ] [^ ]
Or | |
Escape special char
Tag expression { } ( )
C/C++ identifier :i ([a-zA-Z_$][a-zA-Z0-9_$]*)
Quoted string :q (("[^"]*")|('[^']*'))
Space or Tab :b [ |t]
Integer :z [0-9]+

But they certainly don't act related when you try to use them. For example, try something simple, like finding "[A-Za-z]+". That's all occurrences of more than one letter in a row. When I try this via the Visual Studio find dialog with the regex option checked, I get positively bizarre results. It finds a word made up of all letters, true, but as I click "Find Next", it then finds each subsequent letter in the word. Again. What planet are these so-called "regular expressions" from?

The semi-abandoned Microsoft VSEditor blog has a three part tutorial (part one, part two, part three) on using the crazy Visual Studio dialect of Regex. There's a lot of emphasis on the strange < and > begin/end word match characters, which have no equivalent that I know of in the .NET and Perl dialect of regular expressions.

You might say that searching with regular expressions is such an extreme edge condition for most developers that it's not worth the Visual Studio development team's time. I won't disagree with you. It is rare, but it's hardly esoteric. Every developer should be able to grok the value of searching with the basic regular expressions that are a staple of their toolkit these days. Heck, some developers are so hard core they search through their code with Lisp expressions. Basic regex search functionality is awfully mild compared to that.

To be honest, searching with regular expressions isn't a common task for me either. But I'd be a lot more likely to use it if I didn't have to perform a lot of mental translation gymnastics on the occasions that I needed it. Don't make me think, man. But there is hope. There's a free add-in available which offers real regular expression searching in Visual Studio.

* well, mostly standard, anyway. Certainly JavaScript regex syntax could be considered standard these days.

Discussion

Power, Surge Protection, PCs, and You

A question recently came up on the internal Vertigo mailing list about surge protection for home equipment and computers:

  • Do you know if the cheap outlet strips work? I'm not sure if they are a good deal (work as good as more expensive strips) or a waste of money.
  • Do UPS provide better surge protection, or are you just paying more for the battery backup?
  • Do you know of any studies that show how well different devices work?

The best source of information on this is Dan of the eponymously named Dan's Data. Let's start with his essential article on power conditioning:

Mains irregularities come in four flavours - surges, sags, spikes and outages.

A surge is a lengthy (2.5 second or longer) increase in the supply voltage. A sag is a similarly lengthy decrease. By and large, computer power supplies deal with both of these quite well, though it of course depends on the severity of the irregularity, not to mention the quality of the power supply, and how much of its capacity is being used by the computer. The closer to maximum capacity a power supply is, the less likely it is to handle a given surge or sag. For this reason, a computer with a 300 watt (W) Power Supply Unit (PSU) is likely to deal better with line irregularities than one with a 235W PSU, although it may not ever need more than 200W of the PSU's possible output.

Outages are plain old blackouts, which are the Russian roulette of computing - you'll probably get away with no damage or only minor system corruption if the power drops out, but if you're writing to the only copy of an important file at the magic moment, you can kiss it goodbye.

Spikes are the real nasties. A spike is a brief increase in the supply voltage - less than 2.5 seconds, often a lot less. For a fraction of a second, a spike can easily subject your equipment to several hundred volts. If this doesn't blow something up outright, it can progressively damage power supply and other components. So, after a few (or a few hundred) more spikes and surges, your PC dies, for no obvious reason. You may lose a power supply or modem; you may lose your motherboard; you may even lose your hard drive and everything on it.

If lightning directly strikes the power lines near your house, you will have a very exciting time and probably lose some gear, unless everything is unplugged. Fortunately, direct strikes to power lines are rare, because, by definition, a power line is well isolated from earth, and the lightning is looking for an earth. Buried power and telephone lines are a different story, though; lightning strikes a long way away can result in large induced spikes on these sorts of cables.

Dan goes on to describe the risks of garden variety cheap, generic surge protectors:

The plain surge/spike filter powerboards you can buy at various electronics, electrical and hardware stores are, arguably, worse than nothing. This is because they give you the impression you're protected, when you probably aren't - well, not for long, anyway.

The chief surge-clamping component in a basic filter-board is a Metal-Oxide Varistor (MOV).

Metal-oxide-varistors

MOVs pass current only when the voltage across them is above a set value, and they react very quickly (in a matter of microseconds, against the tens of milliseconds a circuit breaker takes). That's the good news. The bad news is that MOVs wear out - they're only good for a few uses, and the bigger the spike, the more damage is done.

Cheap power filters seldom give you any indication whether your MOV is alive or not. If the powerboard has an illuminated power switch, the switch light often goes off when the MOV has died. The switch lights generally last for decades, so no light almost definitely means no MOV - but since the light only shows the status of a fuse, and the fuse won't blow if the MOV has been killed by lots of smaller surges, the light can keep glowing merrily when the MOV has long since kicked the bucket.

The key thing to take away from this article is that surge protectors wear out. They aren't good forever.

Personally, I recommend the Tripp Lite ISOBAR Ultra series of surge protectors. And yes, it has to be the Ultra, because of the little green "protection present" LED the Ultra adds. It lets you know that the MOV inside your power strip is still functioning.

tripp lite isobar 6 ultra

The 6-outlet ISOBAR ultra is about $50 online, so they skew to the expensive side of the surge protection spectrum. But at least you won't get a false sense of security from a cheap power strip with a MOV that blew out three years ago. The howstuffworks article on surge protectors mentions that you should look for power strips with a UL 1449 certification, but I think it's more important to look for one with that "protection present" LED.

If you're really serious about protecting that bit of equipment, you won't bother with a surge protector. A surge protector can only protect you from spikes and surges, after all. What about sags and outages? To get full protection from the entire gamut of power problems, you need an Uninterruptible Power Supply.

And that's why, although I own and use many Tripp Lite Ultra power strips, all my home PCs are plugged into UPSes.

Tripp Lite internet office 750 UPS

I don't have enough experience to recommend a specific brand of UPS, other than a general trust of Tripp Lite, but this excellent Computer Power User article (may be behind paywall after first visit) has a few general recommendations for us:

  • Pick a UPS with USB support. Once the UPS is plugged into your PC's USB port, it'll enable those built-in Windows OS power functions you usually see on laptop computers, related to batteries and battery life. It also enables your computer to do a controlled shutdown when power runs out, exactly like a laptop. Note that you do not need to install the software that comes with the device to achieve this basic level of functionality!

  • Think about battery runtime. You'll want to scale the size of the battery to how much runtime you need, and the power draw of what you're plugging into it. For a typical 2003 vintage desktop PC, ~800VA watts provided at least 10 minutes of runtime under fully-loaded conditions. That should be more than enough for a brief power outage.

  • Is it a UPS or a SPS? If the device is a true UPS, the inverter is running all the time, translating the wall power into clean output. If it's a SPS (standby power supply), the inverter only kicks in when the power is actually out or unstable; at all other times, it's passing the "OK" power signal directly through, as-is. If you get a true UPS, look for "sine wave output". This is the ideal, pure form of power; cheaper devices use "square wave" or "modified square wave" which are harsher on sensitive equipment over time. On a SPS, it doesn't matter so much since the inverter will only be running when the power is actually out.

  • Consider form factor and weight. The more battery power you have, the larger and heavier the unit will be. I have a 1200VA unit at home I inherited through a garage sale, and I can barely lift the thing. Given the choice, I'd opt for something with less power/runtime that is easier to move and less bulky. My home theater PC, for example, is on a modest UPS that's more analogous to a giant power strip.

Once you hook your device up to a true UPS, you've basically removed it from the power grid and hooked it into a custom electricity provider. The only use the UPS has for wall power is to charge its batteries. And be sure not to plug your UPS into any surge protection strip! Plug it directly into the wall. I've seen some truly bizarre PC behavior resulting from daisy-chaining UPSes or surge protection strips.

So, in summary: if it's something you really care about, put it on a decent UPS. If it's something you want to protect, put it behind a decent surge protector with a "protection OK" indicator.

Discussion

Brute Force Key Attacks Are for Dummies

Cory Doctorow recently linked to this fascinating email from Jon Callas, the CTO of PGP corporation. In it, Jon describes the impossibility of brute force attacks on modern cryptography:

Modern cryptographic systems are essentially unbreakable, particularly if an adversary is restricted to intercepts. We have argued for, designed, and built systems with 128 bits of security precisely because they are essentially unbreakable. It is very easy to underestimate the power of exponentials. 2^128 is a very big number. Burt Kaliski first came up with this characterization, and if he had a nickel for every time I tell it, he could buy a latte or three.

Imagine a computer that is the size of a grain of sand that can test keys against some encrypted data. Also imagine that it can test a key in the amount of time it takes light to cross it. Then consider a cluster of these computers, so many that if you covered the earth with them, they would cover the whole planet to the height of 1 meter. The cluster of computers would crack a 128-bit key on average in 1,000 years.

If you want to brute-force a key, it literally takes a planet-ful of computers. And of course, there are always 256-bit keys, if you worry about the possibility that government has a spare planet that they want to devote to key-cracking.

Each additonal bit doubles the number of keys you have to test in a brute force attack, so by the time you get to 128 or 256 bits, you have a staggeringly large number of potential keys to test. The classic illustration of this exponential growth is the fable of the mathematician, the king, and the chess board:

There is an old Persian legend about a clever courtier who presented a beautiful chessboard to his king and requested that the king give him in return 1 grain of rice for the first square on the board, 2 grains of rice for the second square, 4 grains for the third, and so forth. The king readily agreed and ordered rice to be brought from his stores. By the fortieth square a million million rice grains had to be brought from the storerooms. The king's entire rice supply was exhausted long before he reached the sixty-fourth square. Exponential increase is deceptive because it generates immense numbers very quickly.

By the time you get to that 32nd chessboard square, you're facing a very large number indeed.

chessboard illustration of exponential growth

However, 2^32 isn't necessarily a very large set of keys when you're performing a brute force attack with a worldwide distributed network of computers. Such as the RC5 distributed computing project. Here's what they've done so far:

The earliest 56-bit challenge, which ended in 1997, tested keys at a rate of 1.6 million per second. The ongoing 72-bit challenge is currently testing keys at the rate of 139.2 million per second. We're testing keys 88 times faster than we were 10 years ago, through natural increases in computing power and additional computers added to the distributed computing network.

And yet the RC5-72 project still has 1,040 years to go before they test the entire keyspace. Remember, that's for a lousy 72-bit key! If we want to double the amount of time the brute force attack will take, all we need to do is tack on one teeny, tiny little bit to our key. 73-bit key? 2,080 years. 74-bit key? 4,160 years.

It's painfully clear that a brute force attack on even a 128 bit key is a fool's errand. Even if you're using a planet covered with computers that crack keys at the speed of light.

If you're a smart attacker, you already know that brute force key attacks are strictly for dummies with no grasp of math or time. There are so many other vulnerabilities that are much, much easier to attack:

  • Rootkits
  • Social engineering
  • Keyloggers
  • Obtain the private key file and attack the password on it

Of course, beyond ruling out brute force attacks, I'm barely scratching the surface here. Jon Callas' Black Hat conference presentation Hacking PGP (pdf) goes into much more detail, if you're interested.

Discussion

In Defense of the "Smackdown" Learning Model

I've occasionally been told that I have a confrontational style of communication. But that's not necessarily a bad thing – as Kathy Sierra points out, the smackdown learning model can be surprisingly effective:

What happens to your brain when you're forced to choose between two different – and potentially conficting – points of view? Learning. That's what makes the smackdown model such an effective approach to teaching, training, and most other forms of communication.

Wrestlemania III  –  Hulk Hogan vs. Andre the Giant

Whether you're writing user instructions, teaching a class, writing a non-fiction book, or giving a conference presentation, consider including at least some aspect of the smackdown model. It's one of the most engaging ways to cause people's brains to both feel and think – the two elements you need for attention, understanding, retention, and recall.

By presenting different perspectives or views of the topic, the learner's brain is forced into making a decision about which one they most agree with. And as long as the learner is paying attention, you won't even have to ask. In other words, it doesn't have to be a formal exercise where the learner must physically make a choice between multiple things; simply by giving their brain the conflicting message, their brain has no choice. Brains cannot simply leave the conflicts out there without at least trying to make an evaluation.

I think this is also why presentations with two presenters are unusually effective. They're more engaging because you get two viewpoints. There's more back and forth; not one person droning on, but a sort of conversation on the stage.

Although I can recommend the smackdown communication style, it's extremely important that everyone retain their sense of humor. Like "real" wrestling, always remember that you're only fighting for the entertainment value.

Discussion