Coding Horror

programming and human factors

Why Can't Microsoft Ship Open Source Software?

In Codeplex wastes six months reinventing wheels, Ryan Davis has a bone to pick with Microsoft:

I saw an announcement [in March, 2007] that CodePlex, Microsoft's version of Sourceforge, has released a source control client.

This infuriates me. This cool thing they spent six months (six!) writing is called Subversion, and it had a 1.0.0 release [in early 2004]. Subversion had its first beta in late 2003, so the Codeplex folks are waaay behind the state of the art on this one.

As a whole, I think the state of software is abysmal. The only way to make it better is to stop writing new code. New code is always full of bugs, and its an expensive path to get from blank screen to stable program. We need to treat programming more like math, we need to build on our results. Development tools is a special market, as our needs are all very similar, and when we need a tool, we have the skills to make it.

It's a great rant -- you should read the whole thing -- but I'm not sure I entirely agree.

While I do empathize with the overall sentiment that Ryan is expressing here, I also found myself nodding along with Addy Santo, who left this comment:

Author seems to think that all software development is done in basements and dorms. The reality is that software is an industry like any other - and follows the same simple rules of economics. How many brands of sports shoes are there? How many different MP3 players? Flavors of toothpaste ? If you can walk down the soft drink isle and not be "infuriated" by Vanilla Cherry Diet Doctor Pepper then you might just be a hypocrite.

So if you think Microsoft's particular flavor of source control is redundant, you'll really hate Diet Cherry Chocolate Dr. Pepper.

Diet Cherry Chocolate Dr. Pepper

(I am now required by law to link Tay Zonday's Cherry Chocolate Rain video. My apologies in advance. And if that makes no sense to you, see here.)

Are there meaningful differences between Microsoft's Team Foundation flavor of version control and Subversion? The short answer is that there aren't -- if all you're looking for is a carbonated beverage. If all you require is run of the mill, basic centralized source control duties, they're basically the same product. So why not go with the free one?

But Team Foundation is much more than just source control. Of course there are open source equivalents to much of the functionality offered in Team System, as Ryan is quick to point out.

The Codeplex staff stated they needed to write their own client in order to integrate with the TFS server infrastructure. According to an MSDN article (Get All Your Devs In A Row With Visual Studio 2005 Team System), TFS seems to be a complicated tool to help manage your developers. Reading the description, TFS is an issue tracker, unit tester, continuous integration, source control system, and Visual Studio plugin. So, basically a combination of Trac, NUnit, CruiseControl.NET, Subversion, and a Visual Studio plugin. Why not just write the Visual Studio plugin, and hook into the tools people are already using? All those tools have rich plugin-architectures that would probably support any sensible addition you'd want to make.

The answer, of course, is that Microsoft does all that painful integration work for you -- at a price.

If you have the time to look closer, you'll find more flavorful differences between Subversion and TFS source control. Differences more akin to, say, Dr. Pepper and Mr. Pibb.

Mr. Pibb

I'm not going to enumerate all the subtle and not-so-subtle differences between the two here; picking a fight between two modern centralized version control systems is not my goal. They're both great. Choose whatever modern source control system you prefer, and take the time to learn it in depth. Source control is the bedrock of modern software engineering, and I've found precious few developers that truly understand how it works. All that time we were going to spend arguing whether your source control system can beat up my source control system? I've got a radical idea: let's spend it on learning the damn stuff instead.

Still, there is a much deeper, more endemic problem here that Ryan alludes to, and it deserves to be addressed.

One of Microsoft's biggest challenges in the last few years has been that its competitors are free to ship what are, by now, fairly mature open source components as parts of their operating systems. When was the last time you ever saw any open source anything shipping in a Microsoft product? On some deep, dark corporate level, Microsoft must feel compelled to rewrite everything to completely own the source code. Sometimes -- a more cynical person might say "often" -- this results in poor quality copies instead of actual innovation, such as Microsoft's much-maligned MSTest unit test framework. It's a clone of NUnit with all new bugs and no new features, but it can be included in the box with Visual Studio and integrated into the product. It's a one step forward, two steps back sort of affair.

Everybody I know -- including our own Stack Overflow team -- who has tried to use the MSTest flavor of unit tests has eventually thrown up their arms and gone back to NUnit. It's just too painful; the commercial clone lacks the simplicity, power, and community support of the original open source version. There's simply no reason for MSTest to exist except to satisfy some bizarre corporate directive that Microsoft never ship open source code in their products. Furthermore, this blind spot hampers obvious integration points. Microsoft could build first-class integration points for NUnit into Visual Studio. But they haven't, and probably never will, because so much effort is poured into maintaining the second-rate MSTest clone.

In fact, the more I think about this, the more I think Microsoft's utter inability to integrate open source software of any kind whatsoever into their products might just end up killing them. It's a huge problem, and it's only going to get worse over time. Open source seems to evolve according to a different power law than commercial software. If I worked in the upper echelons of Microsoft, I'd be looking at the graph of open source software growth from the years of 1999 to 2008 and crapping my pants right about now.

It's a shame, because the best way to "beat" open source is to join 'em -- to integrate with and ship open source components as a part of your product. Unfortunately, that's the one route that Microsoft seems hell bent on never following.

Update: For background, do read Jon Galloway's explanation: Why Microsoft Can't Ship Open Source Code.

Discussion

Alan Turing, the Father of Computer Science

Charles Petzold was kind enough to send me a copy of his new book, The Annotated Turing: A Guided Tour Through Alan Turing's Historic Paper on Computability and the Turing Machine.

The Annotated Turing: A Guided Tour Through Alan Turing's Historic Paper on Computability and the Turing Machine

One look at the original title page of Turing's paper is enough to convince me that we're fortunate to have a guide as distinguished and patient as Charles. You know you're in trouble when the very first page opens with "Entscheidungsproblem".

on computable numbers with an application to the Entscheideungsproblem

The computer you're using to read this post is based on the mathematical model laid out in that thirty-six page 1936 paper. As are all other computers in the world. The terms Turing Machine and Turing Complete are both derived from that one historic paper.

Needless to say, we owe Alan Turing a lot.

photo of Alan Turing

Not only is Alan Turing the father of all modern computer science, he also was the single individual most responsible for breaking the Enigma code during World War II, and he laid the foundation for artificial intelligence by posing the Turing Test in 1950.

Unfortunately, Alan Turing was also terribly persecuted for the "crime" of being a homosexual. He was arrested in 1952 for having sex with another man. It pains me greatly to read about the degrading and inhumane treatment one of our greatest scientific minds was subjected to. Alan Turing ultimately committed suicide not long afterwards at the age of 42.

The "nobel prize of computing" was founded in Turing's name in 1966. Reading the list of Turing Award recipients is humbling indeed, a reminder of not only how far we've come, but how far we have to go.

Alan Turing statue

The Alan Turing Memorial, erected in 2001, bears this Bertrand Russell quote:

Mathematics, rightly viewed, possesses not only truth, but supreme beauty -- a beauty cold and austere, like that of sculpture.

You'll note that the statue depicts Turing holding an apple in his right hand, a reference to the way he chose to end his life -- by eating a cyanide-laced apple. That was Turing's last message to the world, with clear parallels not only to the legendary scientific knowledge of Isaac Newton, but also the biblical interpretation of forbidden love.

Petzold's Annotated Turing is a gripping testament to the amazing mind of Alan Turing. Writing the book was a nine year labor of love, and it shows. It may be his shortest book -- but it could also be his best yet.

Discussion

Open Wireless and the Illusion of Security

Bruce Schneier is something of a legend in the computer security community. He's the author of the classic, oft-cited 1994 book Applied Cryptography, as well as several well-known cryptography algorithms.

Bruce Schneier knows Alice and Bob's shared secret

The cheeky Norris-esque design above is a reference to the actor names commonly used in examples of shared secret key exchange.

What I find most interesting about Bruce, however, is that he has moved beyond treating computer security as a problem that can be solved with increasingly clever cryptography algorithms:

Schneier now denounces his early success as a naive, mathematical, and ivory tower view of what is inherently a people problem. In Applied Cryptography, he implies that correctly implemented algorithms and technology promise safety and secrecy, and that following security protocol ensures security, regardless of the behavior of others. Schneier now argues that the incontrovertible mathematical guarantees miss the point. As he describes in Secrets and Lies, a business which uses RSA encryption to protect its data without considering how the cryptographic keys are handled by employees on "complex, unstable, buggy" computers has failed to properly protect the information. An actual security solution that includes technology must also take into account the vagaries of hardware, software, networks, people, economics, and business.

This is the programming equivalent of realizing that Peopleware is ultimately a much more important book than The Art of Computer Programming. The shift in focus from algorithms to people is even more evident if you frequent Bruce's excellent blog, or read his newest books Practical Cryptography and Beyond Fear.

As much as I respect Bruce, I was surprised to read that he intentionally keeps his wireless network open.

Whenever I talk or write about my own security setup, the one thing that surprises people – and attracts the most criticism – is the fact that I run an open wireless network at home. There's no password. There's no encryption. Anyone with wireless capability who can see my network can use it to access the internet.

I've advocated WiFi encryption from the day I owned my first wireless router. As I encountered fewer and fewer open WiFi access points over the years, I viewed it as tangible progress. Reading Bruce's opinion is enough to make me question those long held beliefs.

It's a strange position for a respected computer security expert to advocate. But I think I get it. Security is a tough problem. If you take the option of mindlessly flipping a WPA or WEP switch off the table, you're now forced to think more critically about the security of not only your network, but also the fundamental security of the data on your computers. By advocating the radical idea that your wireless network should be intentionally kept open, Bruce is attempting to penetrate the veil of false algorithmic security.

I may understand and even applaud this effort, but I don't agree. Not because I'm worried about the security of my data, or any of the half-dozen other completely rational security arguments you could make against intentionally keeping an open wireless network. My concerns are more prosaic. I desperately want to protect the thin sliver of upstream bandwidth my provider allows me. Some major internet providers are also talking about monthly download caps, too. Bruce's position only makes sense if you have effectively unlimited bandwidth in both directions. Basically, I'm worried about the tragedy of the bandwidth commons. As much as I might like my neighbors, they can pay for their own private sliver of bandwidth, or knock on my door and ask to share if they really need it.

So, to me at least, enabling wireless security is my way of ensuring that I get every last byte of the bandwidth I paid for that month.

It's worth realizing, however, that wireless security is no panacea, even in this limited role. Given a sufficiently motivated attacker, every wireless network is crackable.

Aircrack screenshot

With that in mind, here are a few guidelines.

  1. WEP = Worthless Encryption Protocol

    WEP, the original encryption protocol for wireless networks, is so fundamentally flawed and so deeply compromised it should arguably be removed from the firmware of every wireless router in the world. It's possible to crack WEP in under a minute on any vaguely modern laptop. If you choose WEP, you have effectively chosen to run an open wireless network. There's no difference.

  2. WPA requires a very strong password

    The common "personal" (PSK) variant of WPA is quite vulnerable to brute force dictionary attacks. It only takes a trivial amount of wireless sniffing to obtain enough data to attack your WPA password offline – which means an unlimited amount of computing power could potentially be marshalled against your password. While brute force attacks are still for dummies, most people are, statistically speaking, dummies. They rarely pick good passwords. If ever there was a time to take my advice on using long passphrases, this is it. Experts recommend you shoot for a 33 character passphrase.

  3. Pick a unique SSID (name) for your wireless network

    Default wireless network names just scream I have all default settings! and attract hackers like flies to honey. Also, pre-generated rainbow tables exist for common SSIDs.

  4. Use WPA2 if available

    As of 2006, WPA2 is required on any router that bears the WiFi certification. WPA2, as the name might suggest, is designed to replace WPA. It has stronger and more robust security. There's no reason to use anything less, unless your hardware doesn't support it. And if that's the case, get new hardware.

In the end, perhaps wireless security is more of a deterrent than anything else, another element of defense in depth. It's important to consider the underlying message Bruce was sending: if you've enabled WEP, or WPA with anything less than a truly random passphrase of 33 characters, you don't have security.

You have the illusion of security.

And that is far more dangerous than no security at all.

Discussion

Regular Expressions: Now You Have Two Problems

I love regular expressions. No, I'm not sure you understand: I really love regular expressions.

You may find it a little odd that a hack who grew up using a language with the ain't keyword would fall so head over heels in love with something as obtuse and arcane as regular expressions. I'm not sure how that works. But it does. Regular expressions rock.They should absolutely be a key part of every modern coder's toolkit.

If you've ever talked about regular expressions with another programmer, you've invariably heard this 1997 chestnut:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

The quote is from Jamie Zawinski, a world class hacker who I admire greatly. If he's telling us not to use regular expressions, should we even bother? Maybe, if you live and die by soundbites. But there's a bit more to the story than that, as evidenced by Jeffrey Friedl's exhaustive research on the Zawinski quote. Zawinski himself commented on it. Analyzing the full text of Jamie's posts in the original 1997 thread, we find the following:

Perl's nature encourages the use of regular expressions almost to the exclusion of all other techniques; they are far and away the most "obvious" (at least, to people who don't know any better) way to get from point A to point B.

The first quote is too glib to be taken seriously. But this, I completely agree with. Here's the point Jamie was trying to make: not that regular expressions are evil, per se, but that overuse of regular expressions is evil.

I couldn't agree more. Regular expressions are like a particularly spicy hot sauce – to be used in moderation and with restraint only when appropriate. Should you try to solve every problem you encounter with a regular expression? Well, no. Then you'd be writing Perl, and I'm not sure you need those kind of headaches. If you drench your plate in hot sauce, you're going to be very, very sorry later.

hot sauce

In the same way that I can't imagine food without a dash of hot sauce now and then, I can't imagine programming without an occasional regular expression. It'd be a bland, unsatisfying experience.

But wait! Let me guess! The last time you had to read a regular expression and figure it out, your head nearly exploded! Why, it wasn't even code; it was just a bunch of unintelligible Q*Bert line noise!

Calm down. Take a deep breath. Relax.

Let me be very clear on this point: If you read an incredibly complex, impossible to decipher regular expression in your codebase, they did it wrong. If you write regular expressions that are difficult to read into your codebase, you are doing it wrong.

Look. Writing so that people can understand you is hard. I don't care if it's code, English, regular expressions, or Klingon. Whatever it is, I can show you an example of someone who has written something that is pretty much indistinguishable from gibberish in it. I can also show you something written in the very same medium that is so beautiful it will make your eyes water. So the argument that regular expressions are somehow fundamentally impossible to write or read, to me, holds no water. Like everything else, it just takes a modicum of skill.

Buck up, soldier. Even Ruby code is hard to read until you learn the symbols and keywords that make up the language. If you can learn to read code in whatever your language of choice is, you can absolutely handle reading a few regular expressions. It's just not that difficult. I won't bore you with a complete explanation of the dozen or so basic elements of regular expressions; Mike already covered this ground better than I can:

I'd like to illustrate with an actual example, a regular expression I recently wrote to strip out dangerous HTML from input. This is extracted from the SanitizeHtml routine I posted on RefactorMyCode.

var whitelist =
@"</?p>|<brs?/?>|</?b>|</?strong>|</?i>|</?em>|
</?s>|</?strike>|</?blockquote>|</?sub>|</?super>|
</?h(1|2|3)>|</?pre>|<hrs?/?>|</?code>|</?ul>|
</?ol>|</?li>|</a>|<a[^>]+>|<img[^>]+/?>";

What do you see here? The variable name whitelist is a strong hint. One thing I like about regular expressions is that they generally look like what they're matching. You see a list of HTML tags, right? Maybe with and without their closing tags?

Honestly, is this so hard to understand? To me it's perfectly readable. But we can do better. In most modern regex dialects, you can flip on a mode where whitespace is no longer significant. This frees you up to use whitespace and comments in your regular expression, like so.

var whitelist =
@"</?p>|
<brs?/?>| (?# allow space at end)
</?b>|
</?strong>|
</?i>|
</?em>|
</?s>|
</?strike>|
</?blockquote>|
</?sub>|
</?super>|
</?h(1|2|3)>| (?# h1,h2,h3)
</?pre>|
<hrs?/?>|  (?# allow space at end)
</?code>|
</?ul>|
</?ol>|
</?li>|
</a>|
<a[^>]+>|  (?# allow attribs)
<img[^>]+/?> (?# allow attribs)
";

Do you understand it now? All I did was add a smattering of comments and a lot of whitespace. The same exact technique I would use on any code, really.

But how did I cook up this regular expression? How do I know it does what I think it does? How do I test it? Well, again, I do that the same way I do with all my other code: I use a tool. My tool of choice is RegexBuddy.

regexbuddy whitelist example

Now we get syntax highlighting and, more importantly, real time display of matches there at the bottom in our test data as we type. This is huge. If you're wondering why your IDE doesn't automatically do this for you with any regex strings it detects in your code, tell me about it. I've been wondering that very same thing for years.

RegexBuddy is far and away the best regex tool on the market in my estimation. Nothing else even comes close. But it does cost money. If you don't use software that costs money, there are plenty of alternatives out there. You wouldn't read or write code in notepad, right? Then why in the world would you attempt to read or write regular expressions that way? Before you complain how hard regular expressions are to deal with, get the right tools!

This trouble is worth it, because regular expressions are incredibly powerful and succinct. How powerful? I was able to write a no-nonsense, special purpose HTML sanitizer in about 25 lines of code and four regular expressions. Compare that with a general purpose HTML sanitizer which would take hundreds if not thousands of lines of procedural code to do the same thing.

I do have some tips for keeping your sanity while dealing with regular expressions, however:

  1. Do not try to do everything in one uber-regex. I know you can do it that way, but you're not going to. It's not worth it. Break the operation down into several smaller, more understandable regular expressions, and apply each in turn. Nobody will be able to understand or debug that monster 20-line regex, but they might just have a fighting chance at understanding and debugging five mini regexes.

  2. Use whitespace and comments. It isn't 1997 any more. A tiny ultra-condensed regex is no longer a virtue. Flip on the IgnorePatternWhitespace option, then use that whitespace to make your regex easier for us human beings to parse and understand. Comment liberally.

  3. Get a regular expression tool. I don't stare at regular expressions and try to suss out their meaning through sheer force of will. Neither should you. It's a waste of time. I paste them into my regex tool of choice, RegexBuddy, which not only tells me what the regular expression does, but lets me step through it, and run it through some test data. All in real time as I type.

  4. Regular expressions are not Parsers. Although you can do some amazing things with regular expressions, they are weak at balanced tag matching. Some regex variants have balanced matching, but it is clearly a hack – and a nasty one. You can often make it kinda-sorta work, as I have in the sanitize routine. But no matter how clever your regex, don't delude yourself: it is in no way, shape or form a substitute for a real live parser.

If you're afraid of regular expressions, don't be. Start small. Used responsibly and with the right tooling they are big, powerful – dare I say, spicy – wins. If you make regular expressions a part of your toolkit, you'll be able to write less code that does more. It'll just.. taste batter.

You might enjoy them so much, in fact, that you completely forget about that "second problem".

Discussion

Smart Enough Not To Build This Website

I may not be smart enough to join Mensa, but I am smart enough not to build websites like the American Mensa website.

Mensa forgot password form

Do you see the mistake? If so, can you explain why this is a mistake, and why you'd desperately want to avoid visiting websites that make this mistake?

(hat tip to Bob Kaufman for pointing this out)

Discussion