Coding Horror (Page 124)

1 May 2007

Programming Tip: Learn a Graphics Editor

One lesson I took from MIX is that software development and graphic design are increasingly interrelated disciplines. Although they are very different skillsets, it's important for developers to have some rudimentary design skills, and vice-versa. There's a lot of useful cross-pollination going on between developers and designers.

You can't reinvent yourself as a designer overnight. And nor should you try to. But developers should understand the essentials of graphics editing, such as the difference between JPG and PNG, or vector graphics as represented in the Canvas tag, SVG, and XAML. Most of all, I believe every software developer should have basic competence in a common graphics editor.

Pick your poison:

The GIMP (free, all platforms)
Paint.NET (free, Windows)
Photoshop CS3 ($500+)
Photoshop Elements ($70)
Paint Shop Pro ($99)

No, I won't include Microsoft Paint in this list. And I will add this one warning: although GIMP is both free and powerful, the interface is so excruciatingly difficult to use that by the time you become proficient, you'll be able to handle any graphics editor on the market with ease. I'm a Paint Shop Pro man myself, but there's a broad equivalency between these programs for the type of basic graphic work that most programmers would need. Any of them will do.

Once you've selected a graphics editor, the real challenge is learning how to use it. To get a taste of how complex graphics can be, browse through Ars Technica's Photoshop CS3 review. Fortunately, you'll typically only need to use a fraction of the functionality of these programs. For tips on getting started, browse through this list of graphics software tutorials. Whatever you do, try to wean yourself off crappy graphics tools like Paint. It'll be painful at first, but spend the time to work past the learning curve. Expand your skillset by getting comfortable with a real graphics editor. Get some experience under your belt with the same tools that designers use.

Technically, this has nothing to do with writing code. But competence in a real graphics editor means you'll have a much easier time working with designers because you now share a common toolset. Given enough practice with the tools, you might even be able to copy a good design yourself in a pinch.

Discussion

30 Apr 2007

An Initiate of the Bayesian Conspiracy

An Intuitive Explanation of Bayesian Reasoning is an extraordinary piece on Bayes' theorem that starts with this simple puzzle:

1% of women at age forty who participate in routine screening have breast cancer. 80% of women with breast cancer will get positive mammographies. 9.6% of women without breast cancer will also get positive mammographies. A woman in this age group had a positive mammography in a routine screening. What is the probability that she actually has breast cancer?

This simple puzzle is not all that simple in practice. Only 15% of doctors, when presented with this situation, come up with the correct answer.

Can you come up with the correct answer -- without resorting to Google, the comments to this post, or reading the answer provided in the article?

If so, congratulations. You're a natural initiate of the Bayesian Conspiracy. For the rest of us, Bayes' Theorem is a bit more difficult to grasp:

While there are a few existing online explanations of Bayes' Theorem, my experience with trying to introduce people to Bayesian reasoning is that the existing online explanations are too abstract. Bayesian reasoning is very counterintuitive. People do not employ Bayesian reasoning intuitively, find it very difficult to learn Bayesian reasoning when tutored, and rapidly forget Bayesian methods once the tutoring is over. This holds equally true for novice students and highly trained professionals in a field. Bayesian reasoning is apparently one of those things which, like quantum mechanics or the Wason Selection Test, is inherently difficult for humans to grasp with our built-in mental faculties.

In computer science, it's easy to demonstrate the immense power of Bayes' theorem: it's the basis for almost all spam filters in use today. Bayesian email filtering was first publicized by Paul Graham's A Plan for Spam in mid-2002. Most programmers know about Bayesian filtering now; it's the primary weapon in any modern Spam fighting toolkit.

What you may not know, however, is that there's something even more effective than Bayesian spam filtering. It's eloquently described in William Yerazunis' presentation The Spam Filtering Plateau at 99.9% Accuracy and How to Get Past It (also available in pdf paper form). And it's been implemented as the CRM114 Discriminator for years. That technique is Markovian spam filtering:

How to change a Bayesian spam filter to a Markovian spam filter:

Change the feature generator from single words to spanning multiple words
Change the weighting so that longer features have more weight (ie, longer features generate local probabilities closer to 0.0 and 1.0)
The 2^2n weighting means that the weights are 1, 4, 16, 64, 256, ... for span lengths of 1, 2, 3, 4, 5 ... words

In other words, where Bayesian filters examine the relationship between individual words, Markovian filters expand the scope to examine the relationship between words and phrases. It's a tweak, but a significant one that amplifies the accuracy of the already uncannily accurate Bayes' theorem.

But the true power of Bayes' theorem extends far beyond merely discriminating between spam and non-spam. As the CR114 documentation notes, you can use these powerful statistical models to discriminate between.. well, just about anything:

Spam is the big target with CRM114, but it's not a specialized Email-only tool. CRM114 has been used to sort web pages, resumes, blog entries, log files, and lots of other things. Accuracy can be as high as 99.9 %. In other words, CRM114 learns, and it learns fast.

Now perhaps you can understand why some people are so excited about Bayes' theorem.

Maybe you see Bayes' theorem, and you understand the theorem, and you can use the theorem, but you can't understand why your friends and/or research colleagues seem to think it's the secret of the universe. Maybe your friends are all wearing Bayes' theorem T-shirts, and you're feeling left out. Maybe you're a girl looking for a boyfriend, but the boy you're interested in refuses to date anyone who "isn't Bayesian". What matters is that Bayes is cool, and if you don't know Bayes, you aren't cool.
Why does a mathematical concept generate this strange enthusiasm in its students? What is the so-called Bayesian Revolution now sweeping through the sciences, which claims to subsume even the experimental method itself as a special case? What is the secret that the adherents of Bayes know? What is the light that they have seen?

It's not intuitive for most people, but look a little more closely, and I think you, too, will become an initiate of the Bayesian conspiracy.

Discussion

29 Apr 2007

See You At MIX07

I'm heading off to MIX07 today.

MIX is by far my favorite Microsoft conference, because it "mixes" in a liberal dose of traditionally non-Microsoft folks for a broader range of perspectives. It's probably the only Microsoft conference I'll be attending this year.

Vertigo is also presenting something special at MIX: our new Family.Show WPF reference app.

If you're attending MIX this year and you're interested in meeting up, shoot me an email. I'll definitely bring lots of stickers.

I also set up a Coding Horror Twitter stream for MIX related activities, and I'll try to keep it updated throughout the conference, barring any performance meltdowns -- for example, right now Twitter's static asset server appears to be down, so no images or stylesheets appear.

Discussion

26 Apr 2007

JavaScript and HTML: Forgiveness by Default

I've been troubleshooting a bit of JavaScript lately, so I've enabled script debugging in IE7. Whenever the browser encounters a JavaScript error on a web page, instead of the default, unobtrusive little status bar notification..

default JavaScript status bar error notification

.. I now get one of these glaring, modal error debug notification dialogs:

Javascript debugging error dialog in IE7

I left this setting enabled out of pure forgetfulness. Browsing the web this way, I quickly realized that the web is full of JavaScript errors. You can barely click through three links before encountering a JavaScript error of one kind or another. Often they come in pairs, triplets, sometimes dozens of them. It's nearly impossible to navigate the web with JavaScript error notification enabled.

JavaScript errors are so pervasive, in fact, that it's easy to understand why IE demotes them to nearly invisible statusbar elements. If they didn't, nobody would be able to browse the web without getting notified to death. Firefox goes even further: there's no visible UI whatsoever for any JavaScript errors on the current web page. You have to open the Tools | Error Console dialog to see them.

The upshot of this is that JavaScript errors, unless they result in obvious functional problems, tend to go unnoticed. Things that would cause showstopping compiler errors in any other language are at worst minor inconveniences in JavaScript. When errors are ignored by default, what you end up with is an incredibly tolerant, extremely permissive programming ecosystem. If it works, it works, errors be damned.

But this unparallelled flexibility has its price. Just ask Dave Murdock, who found out the hard way how flexible JavaScript can be.

So I dug into the code, which I hadn't written, and I saw JavaScript similar to this in the execution path that was causing Firefox to hang:
var startIndex = 0;
for (i = startIndex; i < endIndex; i++) {
// do some stuff here
}
This works fine in Internet Explorer 7. What happens in Firefox? i is reinitialized to startIndex after every run of the loop. You have to declare the loop like this for it to work:
var startIndex = 0;
for (var i = startIndex; i < endIndex; i++) {
// do some stuff here
}
Putting the var before i is the way it ought to be as far as I can tell, but both Internet Explorer and Firefox do the wrong thing by developers here. Both browsers should be sticklers about requiring var in a loop variable declaration and produce a clear JavaScript interpreter error before the code has the chance to run.

It's not just JavaScript. HTML and CSS are incredibly forgiving of errors as well. Ned Batchelder observed bizarrely tolerant behavor when specifying named colors that don't exist. Consider this small snippet of HTML:

<font color='red'>█ This is RED</font>

As you vary the named color, you don't get the error you might expect. What you do get is weird colors:

	Firefox	IE7	Opera
red	█ #ff0000	█ #ff0000	█ #ff0000
seagreen	█ #2e8b57	█ #2e8b57	█ #2e8b57
sea green	█ #0e00ee	█ #0e00ee	█ #0ea00e
sxbxxsreen	█ #0000e0	█ #0000e0	█ #00b000
sxbxxsree	█ #00000e	█ #0b00ee	█ #00b000
sxbxxsrn	█ #000000	█ #0b0000	█ #00b000
sxbxeen	█ #000e00	█ #0bee00	█ #00b0ee
sreen	█ #00ee00	█ #00ee00	█ #00ee00
ffff00	█ #ffff00	█ #ffff00	█ #ffff00
xf8000	█ #0f8000	█ #0f8000	█ #0f8000

(If you're curious how "sea green" can possibly equate to blue, the answers are in the comments to Ned's post.)

I can't think of any other programming environment that goes to such lengths to avoid presenting error messages, that tries so hard to make broken code work, at least a little. Although there was a push to tighten up HTML into the much more strictly enforced XHTML, it's an utter failure. If you're not convinced, read Mark Pilgrim's thought experiment:

Imagine that you posted a long rant about how [strict XHTML validation] is the way the world should work, that clients should be the gatekeepers of wellformedness, and strictly reject any invalid XML that comes their way. You click 'Publish', you double-check that your page validates, and you merrily close your laptop and get on with your life.
A few hours later, you start getting email from your readers that your site is broken. Some of them are nice enough to include a URL, others simply scream at you incoherently and tell you that you suck. (This part of the thought experiment should not be terribly difficult to imagine either, for anyone who has ever dealt with end-user bug reports.) You test the page, and lo and behold, they are correct: the page that you so happily and validly authored is now not well-formed, and it not showing up at all in any browser. You try validating the page with a third-party validator service, only to discover that it gives you an error message you've never seen before and that you don't understand.

Unfortunately, the Draconians won: when rendering as strict XHTML, any error in your page results in a page that not only doesn't render, but also presents a nasty error message to users.

XHTML strict rendering error

They may not have realized it at the time, but the Draconians inadvertently destroyed the future of XHTML with this single, irrevocable decision.

The lesson here, it seems to me, is that forgiveness by default is absolutely required for the kind of large-scale, worldwide adoption that the web enjoys.

The permissive, flexible tolerance designed into HTML and JavaScript is alien to programmers who grew up being regularly flagellated by their compiler for the tiniest of mistakes. Some of us were punished so much so that we actually started to like it. We point and laugh at the all the awful HTML and JavaScript on the web that barely functions. We scratch our heads and wonder why the browser can't give us the punishment we so richly deserve for our terrible, terrible mistakes.

Even though programmers have learned to like draconian strictness, forgiveness by default is what works. It's here to stay. We should learn to love our beautiful soup instead.

Discussion

25 Apr 2007

Coding Horror on .NET Rocks

It was my great honor to participate in this week's epsiode of .NET Rocks!

.NET Rocks! is a long running internet radio talk show for software developers that goes all the way back to 2002. I've listened to their shows off and on for years. They've interviewed some very notable software developers along the way, including Steve McConnell, and many other people far more interesting than myself. One of the earliest interviews (#11, to be precise) was with our CEO, Scott Stanfield.

My interview is 64 minutes long, and explores some common themes I've covered here in my blog. It's available in the following formats:

For some reason, I had trouble opening these links by directly clicking, so you may want to right click and do a "save as". More download options are available on the interview page.

Thanks to Carl Franklin and Richard Campbell for a great interview!

Discussion