Coding Horror

programming and human factors

How Not To Write a Technical Book

If I told you to choose between two technical books, one by renowned Windows author Charles Petzold, and another by some guy you've probably never heard of, which one would you pick?

That's what I thought too. Until I sat down to read both of them. Take a look for yourself:

Charles Petzold's Applications = Code + Markup:

Petzold WPF book, sample page 2   Petzold WPF book, sample page 2

Adam Nathan's Windows Presentation Foundation Unleashed:

Nathan WPF book, sample page 1   Nathan WPF book, sample page 2

Beyond the obvious benefit of full color printing, which adds another dimension to any text, it's not even close. The Nathan book is the clear winner:

  • It's full of diagrams, screenshots, and illustrations showing the meaning of the code.
  • The text is frequently broken up by helpful color-coded sidebars such as "digging deeper", "FAQ", and "warning".
  • The code/markup snippets are smaller and easier to digest; they don't dominate page upon page of the text.
  • Liberal use of bullets, tables, subheadings, and other textual elements provides excellent scannability.
  • The book has a sense of humor without being obnoxious or cloying.
  • Did I mention it's in color?

The Nathan book is brilliant. It reads like a blog and competes toe-to-toe with anything you'd find on the web. Petzold's book, in contrast, is a greyscale sea of endless text and interminable code. There are so few diagrams in the book that you get a little thrill every time you encounter one. It also artificially segregates code and markup: the first half is all C# code; it's not until the second half that you see any XAML markup whatsoever, even though XAML is one of the most important new features of WPF, and the one developers will be least familiar with.

I suppose this sort of old-school treatment is typical Petzold. What do you expect from a guy who thinks Visual Studio rots the minds of software developers? The difference in approach is immediately obvious to anyone who opens both books. One looks compelling, fun, and inviting; the other looks like a painful, textbook slog that's the equivalent of writing code in Notepad. Petzold's an excellent writer, but writing alone can't make up for the massive layout deficiencies of his book.

It's too bad, because I loved Petzold's earlier book Code, which was a love letter to the personal computer filled with wonderful illustrations. As much as I respect Petzold, you should avoid his WPF book. Get the Nathan book instead-- you'll love it. Publishers, take note: I'd sure be buying a heck of a lot more technical books if more of them were like this one.

Discussion

Where Are All the Open Source Billionaires?

Hugh MacLeod asks, if open source is so great, where are all the open source billionaires?

If Open Source software is free, then why bother spending money on Microsoft Partner stuff? I already know what Microsoft's detractors will say: "There's no reason whatsoever. $40 billion per year is totally wasted."

This, however is not a very satisfying answer, simply because it doesn't quite ring true. Otherwise there'd be a lot more famous Open Source billionaires out there, being written up in Forbes Magazine or wherever. And Bill Gates would've been ousted years ago.

I can immediately think of one reason there aren't any open-source billionaires:

Linux Distro timeline, 1991-2007

Most competition for open source software comes from other open source software. It's far more cutthroat than the commercial software market could ever be.

Rajesh Setty responded to Hugh's question with a few additional reasons why it's difficult for open source businesses to make money:

If open source is license free, the costs have to be low to work with open source. If cost is one of the reasons for a customer to embrace open source, he or she will pay less than what they would have paid to a comparable enterprise software to do the same job. An open source company would have to therefore work twice as hard to a comparable enterprise software company to make the same or less amount of money. This means that they have to have a lot more resources than the competing enterprise software company. How can you have a smaller pie but feed a lot more people and still keep everyone happy?

But I think MacLeod is asking the wrong question, so Setty's answers, although well reasoned, are irrelevant. There probably won't ever be any open source billionaires. Just ask JBoss founder Marc Fleury:

To do [open source software] seriously, professionally, in a sustainable fashion you need to make a living. What is clearly compromised is the "instant billionaire" club. I remember the first time I saw Torvalds on a panel and someone asked "why isn't there an open source billionaire", and I immediately thought "because you are distributing FREE SOFTWARE, dummy." And there still isn't an open source billionaire today. There are very few billionaires period. Your average MSFT developer certainly isn't one.

I for one don't believe there will ever be an open source billionaires club. There are and will be many multi-millionaires though. If we execute on our plan without screwing up, we will create a large batch of OS millionaires. We care about the developers and people who create real value in companies getting rewarded.

The lack of open source software billionaires is by design. It's part of the intent of open source software -- to balance the scales by devaluing the obscene profit margins that exist in the commercial software business. Duplicating software is about as close to legally printing money as a company can get; profit margins regularly exceed 80 percent.

To ask where the open source billionaires are is to demonstrate a profound misunderstanding of how open source software works. If you wanted to become obscenely rich by starting an open source software company, I'm sorry, but you picked the wrong industry. You'll make a living, perhaps even a lucrative one. But you won't become Bill Gates rich, or Paul Allen rich, by siphoning away the exorbitant profit margins commercial software vendors have enjoyed for so many years.

But there is a silver lining.

There are real millionaires – even billionaires – who built companies on open source software. Just ask Larry Page and Sergey Brin. Or the YouTube founders. The real money isn't in the software. It's in the service you build with that software.

Discussion

Welcome to Dot-Com Bubble 2.0

The dot-com bubble was a watershed event for software developers. You simply couldn't work in the field without having something miraculous or catastrophic happen to you. Or both at once.

The "dot-com bubble" was a speculative bubble covering roughly 1995 — 2001 during which stock markets in Western nations saw their value increase rapidly from growth in the new Internet sector and related fields. The period was marked by the founding (and in many cases, spectacular failure) of a group of new Internet-based companies commonly referred to as dot-coms. A combination of rapidly increasing stock prices, individual speculation in stocks, and widely available venture capital created an exuberant environment in which many of these businesses dismissed standard business models, focusing on increasing market share at the expense of the bottom line. The bursting of the dot-com bubble marked the beginning of a relatively mild yet rather lengthy early 2000s recession in the developed world.

Like many others, I saw warning signs all over the place in late 2000:

  • Skyrocketing salaries resulted in a rash of neophytes entering the software development field with giant dollar signs in their eyes.
  • Internet companies with irrational, unsustainible business strategies built to cash in and hiring at a frenetic pace.
  • You were never more than two degrees of separation away from a tale of some programmer who became an overnight millionaire.

Despite all the warning signs, it never occurred to me that I was working in a bubble. Until it popped. I don't want to make that mistake again. The three years after the bubble burst were dark, dark times for software developers. Everyone had to scramble to find a place to weather the worst of the storm. And the backlash was severe: rampant offshoring, devaluation of the IT industry as a whole, and diminished salaries and opportunies for everyone.

Seven years later, we're now clearly in the throes of another dot-com bubble. You might argue that the new bubble has been in effect since mid-2006, but the signs are absolutely unmistakable now. The job market for software developers is every bit as hyper-competitive as it was in 1999. The idea that you can found a company on the internet-- and make money-- is taken seriously now. There's a new one every week.

We've had seven long years to think about what the dot-com bubble meant, and where things went wrong. Here's what I think the original bubble got wrong, and what's different in today's bubble:

  1. Most people have an always-on broadband connection to the internet. Broadband penetration was a mere 5 percent in 2000; as of early 2007 it's now over 50 percent. So many dot-com business models were predicated on the mass market of dialup users, conveniently forgetting how brutally painful it was to use the internet on a modem.
  2. The emergence of viable ad networks. Few dot-com companies had revenue models that made any sense. Now there are dozens of potential advertising networks that you can plop on a web page to guarantee income proportionate to the pageviews. This advertising-supported model pioneered on the web is even trickling over into desktop applications.
  3. Moore's law and open source. An internet startup can now scale to thousands of concurrent users on a few cheap, commodity server boxes, running proven open-source solutions like Linux and MySQL. All of this was possible in 2000, but the "whitebox" software and hardware was unproven, and tended to be far behind the expensive, proprietary solutions. Now it's assumed, mature, a known quantity — and the cost for that hardware and software is precipitously close to $0.

But the original bubble wasn't all greed and stupidity — I recommend reading through Paul Graham's What the Bubble Got Right for the upside.

This new bubble does appear to be a bit more sane than the last one, at least initially. The greasy odor of get-rich-quick isn't quite as overpowering as it was in 1999. So far, people seem more interested in building sustainible, useful businesses than rapid market capitalizations.

Bubbles are exciting times. Fortunes are made and lost; careers built and destroyed. It's great while it lasts. So here's my question to you: what will you do differently in this bubble?

Discussion

Apparently Bloggers Aren't Journalists

I ran across this blog entry while researching Microsoft's new Silverlight Flash competitor. It makes some disturbing complaints about the limitations of Silverlight, in bold all-caps to boot:

This is where I threw my hands up in disgust. What in the holy name of Scooby-Doo are those people thinking?!?! After poring through the [Silverlight] API, I thought "I must be mistaken. Surely this is a mistake." But then I asked a colleague and he confirmed it for me. Let me skip a couple lines and highlight this so you all can see it clearly.

WPF/E (Silverlight) HAS NO SUPPORT FOR BINDING TO MODELS, BINDING TO DATA, OR EVEN CONNECTING TO NETWORK RESOURCES TO OBTAIN DATA.

So, I will summarize Microsoft's efforts to date around Silverlight. They have created a declarative programming model that uses XAML as an instantiation language for rich 2D (not 3D) content and animations, as well as extended JavaScript to support this model. Using this model, you can create embedded mini-apps that have access to rich animations, graphics, audio, and video objects. However, these mini applications cannot communicate with the outside world, they cannot consume web services, and they cannot bind UI elements to data. In addition, this model doesn't even have support for things that should be considered a stock part of any library such as buttons, checkboxes, list boxes, list views, grids, etc.

Those are serious problems indeed. I found this blog entry because it's referenced by another blog entry on the limitations of Silverlight:

But what are the capabilities of Silverlight itself? I came across this blog entry of someone who has downloaded the SDK, read the documentation, and looked at the code. Microsoft seems to be waiting for the Orcas release cycle before adding data binding, controls, and .Net runtime support to Silverlight - and Orcas could be delayed until 2008.

But before I clicked through to that blog entry, I started by reading this blog post on the limitations of Silverlight:

Although I just found this post about it which points out that [Silverlight] has a lot of pretty major shortcomings.

The idea that Microsoft's new Flash-alike can't even download data via HTTP seemed impossibly wrong to me. Couldn't be. Can't be. Like any large company, Microsoft certainly makes their share of dumb mistakes. But an epic mistake like that stretches the bounds of credibility even for Microsoft.

In short, I didn't believe it. So I downloaded the Silverlight SDK to take a look for myself. Guess what I found in the Silverlight SDK documentation, not five minutes after downloading it?

The Downloader object is a special-purpose WPF/E object that provides the ability to download content, such as XAML content, JavaScript content, or media assets, such as images. By using the Downloader object you do not have to provide all application content when the WPF/E control is instantiated. Rather, you can download content on demand in response to application needs. The Downloader object provides functionality for initiating the data transfer, monitoring the progress of the data transfer, and retrieving the downloaded content.

The properties and methods of the Downloader object are modeled after the XMLHttpRequest (XHR) set of APIs. XMLHttpRequest provides JavaScript and other web browser scripting languages the ability to transfer and manipulate XML data to and from a web server using HTTP.

I'm not out to defend Silverlight here.

It's clear that blogger A posted completely erroneous information; I'm not sure how he could have missed the obviously named and prominently featured Downloader object in the SDK. It really calls into question whether or not he actually used the SDK at all. But let's assume, for the moment, that he did, and it was a simple oversight on his part. The strident tone of his post makes me think otherwise, but let's give him the benefit of the doubt.

The real problem is that this erroneous information was echoed by blogger B, and then echoed again by blogger C. At no point did anyone stop to actually verify the claims of blogger A, even in the most rudimentary, basic of ways. All they had to do was download the SDK and look for themselves to confirm that his complaints were true. I'm talking five minutes, maximum.

But they didn't.

Instead, they blindly parroted blogger A, assumed that all of his claims were valid, and perpetuated his mistake across the internet.

Let's compare that behavior with the Society of Professional Journalists Code of Ethics, which includes the following guidelines:

  • Test the accuracy of information from all sources and exercise care to avoid inadvertent error. Deliberate distortion is never permissible.
  • Diligently seek out subjects of news stories to give them the opportunity to respond to allegations of wrongdoing.
  • Identify sources whenever feasible. The public is entitled to as much information as possible on sources' reliability.

I realize that it's unrealistic to hold every blogger on planet Earth to the same standards as professionally trained journalists. Bloggers, after all, aren't professionals.

But I do believe blog readers have a right to expect that amateur bloggers will:

  1. Do their homework before writing.
  2. Do some basic investigation of other bloggers' claims before linking to their posts or quoting them.

None of these bloggers did any of the above. Don't let their mistakes delude you into thinking this is typical or acceptable behavior. It isn't. We may not be professional journalists-- but we are still accountable for the words we write. It pains me that I even have to say this in 2007, but don't assume everything you read on the internet is true. Check the facts yourself. Putting in that extra bit of effort won't transform you into a journalist, but I can guarantee it'll make you a better blogger.

Discussion

Sins of Software Security

I picked up a free copy of 19 Deadly Sins of Software Security at a conference last year. I didn't expect the book to be good because it was a free giveaway item from one of the the vendor booths. But I paged through it on the flight home, and I was pleasantly surprised. It's actually quite good.

19 Deadly Sins of Software Security

Software security isn't exactly my favorite topic, so holding my interest is no mean feat. It helps that the book is mercifully brief and to the point, and filled with practical examples and citations. It's an excellent cross-platform, language-agnostic checksheet of common software security risks.

Here's a brief summary of each of the 19 sins, along with a count of the number of vulnerabilities I found in the Common Vulnerabilities and Exposures database for each one.

Affected Languages Exploit count
Buffer Overflows C, C++ A buffer overrun occurs when a program allows input to write beyond the end of the allocated buffer. Results in anything from a crash to the attacker gaining complete control of the operating system. Many famous exploits are based on buffer overflows, such as the Morris worm. 3,326
Format String Problems C, C++ The standard format string libraries in C/C++ include some potentially dangerous commands (particularly %n). If you allow untrusted user input to pass through a format string, this can result in anything from arbitrary code execution to spoofing user output. 411
Integer Overflows C, C++, others Failure to range check on integer types. This can cause integer overflow crashes and logic errors. In C/C++, integer overflows can be turned into a buffer overrun and arbitrary code execution, but all languages are prone to denial of service and logic errors. 288
SQL Injection All Forming SQL statements with untrusted user input means users can "inject" their own commands into your SQL statements. This puts your data at risk, and can even lead to complete server and network compromise. 2,225
Command Injection All Occurs when untrusted user input is passed to a compiler or interpreter, or worse, a command line shell. Potential risk depends on the context. 193
Failing to Handle Errors Most A broad category of problems related to a program's error handling strategy; anything that leads to the program crashing, aborting, or restarting is potentially a denial of service issue and therefore can be a security problem, particularly on servers. 80
Cross-Site Scripting (XSS) Any web-facing A web application takes some input from the user, fails to validate it, and echoes that input directly back to the web page. Because this code is running in the context of your web site, it can do anything your website could do, including retrieving cookies, modifying the HTML DOM, and so forth. 2,996
Failing to Protect Network Traffic All Most programmers understimate the risk of transmitting data over the network, even if that data is not private. Attackers can eavesdrop, replay, spoof, tamper with, or otherwise hijack any unprotected data sent over the wire. 26
Use of Magic URLs and Hidden Form Fields Any web-facing Passing sensitive or secure information via the URL querystring or hidden HTML form fields, sometimes with lousy or ineffectual "encryption" schemes. Attackers can use these fields to hijack or manipulate a browser session. 33
Improper use of SSL and TLS All Using most SSL and TLS APIs requires writing a lot of error-prone code. If programmers aren't careful, they will have an illusion of security in place of the actual security promised by SSL. Attackers can use certificates from lax authorities, subtly invalid certificates, or stolen/revoked certificates, and it's up to the developer to write the code to check for that. 123
Use of Weak Password-Based Systems All Anywhere you are using passwords, you need to seriously consider the risks inherent to all password-based systems. Risks like phishing, social engineering, eavesdropping, keyloggers, brute force attacks, and so on. And then you have to worry about how users choose passwords, and where to store them securely on the server. Passwords are a necessary evil, but tread carefully. 1,235
Failing to Store and Protect Data Securely All Information spends more time stored on disk than in transit. Consider filesystem permissions and encryption for any data you're storing. And try to avoid hardcoding "secrets" into your code or configuration files. 56
Information Leakage All The classic trade-off between giving the user helpful information, and preventing attackers from learning about the internal details of your system. Was the password invalid, or the username? 26
Improper File Access All 1) There is often a window of vulnerability between time of check and time of use (TOCTOU) in the filesystem, so an attacker can slip changes in, particularly if the files are accessed over the network.
2) The "it isn't really a file problem"; you may think you have a file, but attackers may substitute a link to another file, or a device name, or a pipe.
3) Allowing users control over the complete filename and path of files used by the program; this can lead to directory traversal attacks.
5, 58
Trusting Network Name Resolution All It's simple to override and subvert DNS on a server or workstation with a local HOSTS file. How do you really know you're talking to the real "secureserver.com" when you make a HTTP request? 20
Race Conditions All A race condition is when two different execution contexts are able to change a resource and interfere with each other. If attackers can force a race condition, they can execute a denial of service attack. Unfortunately, writing properly concurrent code is incredibly difficult. 139
Unauthenticated Key Exchange All Exchanging a private key without properly authenticating the entity/machine/service that you're exchanging the key with. To have a secure session, both parties need to agree on the identity of the opposing party. You'd be shocked how often this doesn't happen. 1
Cryptographically Strong Random Numbers All Imagine you're playing poker online. The computer shuffles and deals the cards. You get your cards, and then another program tells you what's in everybody else's hands. Random numbers are similarly fundamental to cryptography; they're used to generate things like keys and session identifiers. An attacker who can predict numbers-- even with only a slight probability of success-- can often leverage this information to breach the security of a system. 5
Poor Usability All Security is always extra complexity and pain for the user. It's up to us software developers to go out of our way to make it as painless as it can reasonably be. Security only works if the secure way also happens to be the easy way. All

It's true that C and C++ have a heavy cross to bear. But only 3 of the 19 sins can be completely lumped on the plate of K&R. The other 16 apply almost everywhere, to any developer writing code on any platform. It's a sobering thought.

The usability sin is the one that's most interesting to me. Usability is tough under the best of conditions-- and security is the worst of conditions, at least from the user's perspective. It's quite a challenge. There are a few great links in the book on the topic of security usability:

You can certainly find other books that go much deeper on particular aspects of software security. But if you're looking for an excellent primer on the entire gamut of security problems that could potentially afflict your project, 19 Deadly Sins of Software Security is an excellent starting point.

Discussion