Coding Horror

programming and human factors

HCI Remixed

I like to take one or two books with me when I travel, and one of the books I chose for this trip is HCI Remixed.

hci-remixed

Sometimes the books I choose are a bust. Fortunately that didn't happen this time.

HCI Remixed covers all the major milestones in the field of human computer interaction. And when I say major, I mean it: things like Douglas Engelbart's famous demonstration, now referred to as The Mother of All Demos:

On December 9, 1968, Douglas C. Engelbart and the group of 17 researchers working with him in the Augmentation Research Center at Stanford Research Institute in Menlo Park, CA, presented a 90-minute live public demonstration of the online system, NLS, they had been working on since 1962. The public presentation was a session in the Fall Joint Computer Conference held at the Convention Center in San Francisco, and it was attended by about 1,000 computer professionals. This was the public debut of the computer mouse. But the mouse was only one of many innovations demonstrated that day, including hypertext, object addressing and dynamic file linking, as well as shared-screen collaboration involving two persons at different sites communicating over a network with audio and video interface.

So, all those trappings of modern computing that we take for granted today? Engelbart demonstrated them all two years before I was born. It just took a while for the rest of the world to catch up to his vision.

That's the lesson of many of the groundbreaking HCI discoveries presented in this book. Some people see further. Engelbart was so far ahead of his time in 1968 that his demonstration wasn't taken seriously -- it seemed absurd and impractical. It really makes you wonder which of today's HCI researchers we're ignoring but shouldn't be.

The book also takes an interesting approach; it doesn't summarize the papers, instead, it presents the reflections of current working HCI professionals on the papers. It's a little bit meta. You're hearing the impact of these HCI discoveries -- some big, some small -- as related by young researchers who were heavily influenced by them.

As a primer and overview of the field of human computer interaction, it's tough to beat. Reading this reminds me how far we've come, and yet how far we have to go.

Discussion

The Problem With URLs

URLs are simple things. Or so you'd think. Let's say you wanted to detect an URL in a block of text and convert it into a bona fide hyperlink. No problem, right?

Visit my website at http://www.example.com, it's awesome!

To locate the URL in the above text, a simple regular expression should suffice -- we'll look for a string at a word boundary beginning with http:// , followed by one or more non-space characters:

bhttp://[^s]+

Piece of cake. This seems to work. There's plenty of forum and discussion software out there which auto-links using exactly this approach. Although it mostly works, it's far from perfect. What if the text block looked like this?

My website (http://www.example.com) is awesome.

This URL will be incorrectly encoded with the final paren. This, by the way, is an extremely common way average everyday users include URLs in their text.

What's truly aggravating is that parens in URLs are perfectly legal. They're part of the spec and everything:

only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

Certain sites, most notably Wikipedia and MSDN, love to generate URLs with parens. The sites are lousy with the damn things:

http://en.wikipedia.org/wiki/PC_Tools_(Central_Point_Software)
http://msdn.microsoft.com/en-us/library/aa752574(VS.85).aspx

URLs with actual parens in them means we can't take the easy way out and ignore the final paren. You could force users to escape the parens, but that's sort of draconian, and it's a little unreasonable to expect your users to know how to escape characters in the URL.

http://en.wikipedia.org/wiki/PC_Tools_%28Central_Point_Software%29
http://msdn.microsoft.com/en-us/library/aa752574%28VS.85%29.aspx

To detect URLs correctly in all most cases, you have to come up with something more sophisticated. Granted, this isn't the toughest problem in computer science, but it's one that many coders get wrong. Even coders with years of experience, like, say, Paul Graham.

If we're more clever in constructing the regular expression, we can do a better job.

(?bhttp://[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]

  1. The primary improvement here is that we're only accepting a whitelist of known good URL characters. Allowing arbitrary random characters in URLs is setting yourself up for XSS exploits, and I can tell you that from personal experience. Don't do it!
  2. We only allow certain characters to "end" the URL. Ending a URL in common punctuation marks like period, exclamation point, semicolon, etc means those characters will be considered end-of-hyperlink characters and not included in the URL.
  3. Parens, if present, are allowed in the URL -- and we absorb the leading paren, if it is there, too.

I couldn't come up with a way for the regex alone to distinguish between URLs that legitimately end in parens (ala Wikipedia), and URLs that the user has enclosed in parens. Thus, there has to be a handful of postfix code to detect and discard the user-enclosed parens from the matched URLs:

if (s.StartsWith("(") && s.EndsWith(")"))
{
return s.Substring(1, s.Length - 2);
}

That's a whole lot of extra work, just because the URL spec allows parens. We can't fix Wikipedia or MSDN and we certainly can't change the URL spec. But we can ensure that our websites avoid becoming part of the problem. Avoid using parens (or any unusual characters, for that matter) in URLs you create. They're annoying to use, and rarely handled correctly by auto-linking code.

Discussion

The Web Browser is the New Laptop

I've been reading a lot of good things about the emerging "netbook" category of subnotebooks:

The term netbook refers to a category of small to medium sized, light-weight, low-cost, energy-efficient, Internet-centric laptops, generally optimized for Web surfing and e-mailing.

Like any self-respecting nerd, I already own a laptop, of course, but my wife has taken to surfing the internet at night and doing her Java-based New York Times crosswords in bed. Plus there's the whole pregnancy thing, so it'd be nice for her to have her own "space" laptop-wise. So I pulled the trigger on an Acer Aspire One netbook.

acer aspire one, in pink

The specs are indeeed modest, but not bad at all for the $369 sticker price:

  • Intel Atom 1.6 Ghz CPU
  • 802.11 b/g wireless
  • 1 GB ram
  • 120 GB hard drive
  • 8.9" 1024x600 display
  • Windows XP Home
  • webcam, mic, 3 usb ports, ethernet, vga out.

I didn't expect much from this cheap, diminutive laptop; it's mostly for web surfing, light email, maybe a tiny bit of miscellaneous office work. And in case the color choice didn't make it clear, it's not even for me. That's my story, and I'm sticking to it!

As I sat down to configure this machine, I belatedly realized that for most of what I do with a computer, this cute little netbook is perfectly adequate. Sure, the keyboard is a bit cramped, it's no performance powerhouse, and the screen size, at 1024 x 600, is definitely the minimum necessary for it to be practical. It took some adaptation, but it wasn't frustrating or disappointing to use. It delivered (almost) the same web experience I'd get on my desktop or laptop, with no serious compromises. It just.. worked.

acer aspire one, screen closeup

As stupid as it sounds, I had fallen in love with this silly little netbook.

But even that's not the whole story -- after spending some time with a netbook, I realized that calling them "small laptops" is a mistake. Netbooks are an entirely different breed of animal. They are cheap, portable web browsers.

The most popular application in the world is the web browser. By far. Number two isn't even close. Just check out the front page of Wakoopa's most used apps:

Wakoopa: most used apps

By my reckoning, six of the top 10 "apps" here are actually web browsers or websites running in web browsers. It's certainly consistent with how my wife and I are increasingly using our computers. Every day, more and more of what we need to do is delivered through a browser, with fewer and fewer compromises. I spend ridiculous, unhealthy amounts of time browsing the web, and this netbook does that with aplomb.

At this point, who cares what operating system you run? Choice of web browser will have a far more profound impact on most people's daily lives. As the prices for netbooks inevitably collapse, they are poised to transform the entire computer market, threatening both Apple and Microsoft.

  1. Apple laptops are beautiful, but I can't imagine the average user who spends all their time in the web browser paying 3 to 4 times the price of a netbook for a Mac laptop. Macs are brilliantly designed, it's true, but that's a hell of a tax to run Safari.

  2. Speaking of taxes, what about the Microsoft Tax? I'm already heavily infatuated with the current iteration of netbooks as represented by the Aspire. And they can only get better and cheaper over time. Imagine a machine with the same specs as the Aspire One but at $299, $199, maybe even $99. It's going to happen. It's inevitable. This is a huge opening for Linux; it's the ideal way to deliver a complete, modern web browser at nearly zero marginal cost to both the vendor and consumer.

  3. The booming growth of netbooks will keep Windows XP alive much longer than expected. As much as I like Vista as a solid (if not stellar) upgrade from XP, the prehistoric 2001 era system requirements for XP still make it a better choice for these kinds of devices. 1 GB of memory is roomy; a measly 16 GB of disk space plenty. Can't say that for Vista. No sir. It's also an opportunity for Microsoft to play games with the Linux market by reducing the price of XP to crazy low, fire sale, everything-must-go levels. But only for "select" and "preferred" OEM vendors, of course, not for the common folks on the street.

I won't lie. One of the attractions of this particular model is that it runs Windows XP, an operating system I, and every other software vendor on the planet, know by heart. It'll run whatever without me having to think too much about it. But I could easily see myself leaving some of that potential flexibility on the table if the price dropped to $199 or so. If it runs Firefox 3, or Chrome, or Opera, that's about all I need.

I'm quite happy with our Acer Aspire One netbook for now, but I'll probably be picking up one of the next generation of netbooks for myself.

I agree with Omar that Netbooks are poised to transform computing. They still have a way to go, of course, but the $299 or $199 no-compromises, go-anywhere, zero-monthly-contract-fees web browser in the palm of your hand -- with the requisite 9" or larger screen -- is almost upon us. I guess I hadn't been paying enough attention, because that's a shocker to me.

Pitching the web browser as a bona-fide operating system always seemed stupid to me. Or at least it did, until I sat down with my first netbook. If I were Apple or Microsoft, I'd I'd be watching this category of devices very, very closely.

Discussion

You're Reading The World's Most Dangerous Programming Blog

Have you ever noticed that blogs are full of misinformation and lies? In particular, I'm referring to this blog. The one you're reading right now. For example, yesterday's post was so bad that it is conclusive proof that I've jumped the shark.

Again.

Apparently, according to one Reddit commenter, the information presented here is downright dangerous:

Jeff Atwood has always held the distinction of having the most dangerous programming blog, in that some young or aspiring developers may actually listen to some of his "advice", but now he's somehow managed to snag the achievement of having the most inane programming blog as well.

To put it in more frank terms Jeff: What you've just written is one of the most insanely idiotic things I have ever heard. At no point in your rambling, incoherent response were you even close to anything that could be considered a rational thought. Everyone in this room is now dumber for having read this post. I award you no points, and may God have mercy on your soul.

I enjoyed the Billy Madison quote, but I'm not sure my blog has earned that particular distinction yet. If this blog is the most dangerous content that young, inexperienced developers have ever read then, well, I'd have to seriously question whether or not they've ever actually used this thing we call the "world wide web".

Allow me to illustrate with an example.

Today I happened across this blog entry from Mads Kristensen. In it, Mads explains that Deflate is faster than GZip.

First I tested the GZipStream and then the DeflateStream. I expected a minor difference because the two compression methods are different, but the result astonished me. I measured the DeflateStream to be 41% faster than GZip. That's a very big difference. With this knowledge, I'll have to change the HTTP compression module to choose Deflate over GZip.

This was a surprising result to me, because the two compression algorithms are very closely related. On the other hand, we use GZip extensively and heavily to cache HTML fragment output strings on the Stack Overflow server, as Scott Hanselman explains. If Deflate really is that much faster, we need to switch to it!

But, like any veteran internet user, I never take what I read on a blog -- or any other site on the internet, for that matter -- as fact. Rather, it's a germ of an intriguing idea, a call to action. I fired up my IDE and built a small test harness to test for myself: is Deflate faster than GZip?

public static class StopwatchExtensions
{
public static long Time(this Stopwatch sw, Action action, int iterations)
{
sw.Reset();
sw.Start();
for (int i = 0; i < iterations; i++) { action(); }
sw.Stop();
return sw.ElapsedMilliseconds;
}
}
class Program
{
static void Main(string[] args)
{
string s = File.ReadAllText(@"c:test.html");
byte[] b;
var sw = new Stopwatch();
b = CompressGzip(s);
Console.WriteLine("gzip size: " + b.Length);
Console.WriteLine(sw.Time(() => CompressGzip(s), 1000));
Console.WriteLine(sw.Time(() => DecompressGzip(b), 1000));
b = CompressDeflate(s);
Console.WriteLine("deflate size: " + b.Length);
Console.WriteLine(sw.Time(() => CompressDeflate(s), 1000));
Console.WriteLine(sw.Time(() => DecompressDeflate(b), 1000));
}
}

The results were surprising: on my box, GZip is just as fast as Deflate. For giant strings, for medium strings, for small strings. In every possible testing combination I can think of, Deflate is nowhere near 40% faster.

gzip size: 3125
242
171
deflate size: 3107
225
149

That's not exactly what Mads' blog entry tells me should happen. Do I think Mads is an idiot for posting this? Well, no. I don't.

  • The original blog entry was posted in late 2006; since then new versions of the .NET framework have shipped and hardware has gotten faster. Perhaps there was some significant change in either that produces this different outcome.
  • My test is a bit different than Mads' testing. I use a random HTML file as the compression target; I can't tell exactly what he's compressing in his benchmark. I also tried with small, medium, and large strings. The tests are similar, but they're not the same.

Is this the type of dangerous misinformation that blogs are vilified for? Should I be angry at Mads for posting this? Not at all. I learned a bit more about Deflate and GZip. It provided an opportunity for me to refactor my compression code some. I even learned how to benchmark using lambda syntax. If I hadn't read this post, if it hadn't provided that impetus of an idea for me to ponder, I wouldn't have bothered.

I am a better programmer for having read that blog post. Even though, near as I can tell, it's offering inaccurate advice.

Update: I got a bit more curious about this, so I ran some more tests on different machines. Here are the results, in milliseconds, for a thousand runs each using the Google homepage HTML as the target (it's about 7 Kb):

gzip vs. deflate graph

How much faster is Deflate than GZip?

Core 2 Duo
3.5 Ghz
Core 2 Quad
1.86 Ghz
Athlon X2
2.1 Ghz
Compress8% faster8% faster50% faster
Decompress15% faster17% faster37% faster

There's the 40% Mads was talking about. That is a little shocking when you consider that GZip is simply Deflate plus a checksum and header/footer! (You can download the source code for this test and try it yourself.)

So my point -- and I do have one -- is this: when you say that the information presented on a blog is "dangerous", you're implying the audience is too dumb or inept to read critically.

I, for one, have too much respect for my audience to ever do that. I am continually humbled by the quality of the comments and discussion on the blog entries I post. In fact, I'd say that has been the single most surprising thing I've learned in my four plus years of blogging: the best content always begins where the blog post ends. My audience is far, far smarter than I will ever be.

On second thought, maybe what I promote on this blog is dangerous: thinking for yourself.

But I'm pretty confident you can handle that.

Discussion

The One Thing Every Software Engineer Should Know

I'm a huge Steve Yegge fan, so It was a great honor to have Steve Yegge on a recent Stack Overflow podcast. One thing I couldn't have predicted, however, was one particular theme of Steve's experience at Google and Amazon that kept coming up time and time again:

If there was one thing I could teach every engineer, it would be how to market.

Not how to type, not how to write, not how to design a programming language, but marketing.

This is painful for developers to hear, because we love code. But all that brilliant code is totally irrelevant until:

  1. people understand what you're doing
  2. people become interested in what you're doing
  3. people get excited about what you're doing

That, in a nutshell, is marketing. Just because you're a marketer doesn't necessarily mean you're a marketing weasel. Sure, the two things are highly correlated -- but at its core, marketing is little more than an intermediate level course on fundamental human communication. Not something us programmers have historically been so great at.

That's why even the hardest of hard-core programmers should be paying attention to people like Seth Godin. Steve was referring to marketing in the broader, more timeless sense of getting other people interested in your ideas.

After hearing Steve mention this several times on our podcast -- and having seen his related talk How to Ignore Marketing and Become Irrelevant in Two Easy Steps I suddenly realized why I was so fascinated with two particular books I recently discovered. Books I kept referring to, over and over, during the development of Stack Overflow.

Whatever You Think, Think the Opposite It's Not How Good You Are,
It's How Good You Want to Be
Whatever you think, think the opposite It's Not How Good You Are, It's How Good You Want to Be

I couldn't put down these two small-format books from the late Paul Arden. Guess what Mr. Arden did for a living? That's right, he was an executive creative director for Saatchi & Saatchi -- an advertising firm.

I had been reading dirty books. Marketing books. By choice, even. I'm a bit embarrassed to admit this, because these are exactly the kinds of pithy little business books I usually make fun of other people for reading. But in reading these books, I realized that so much of what we do on Stack Overflow has nothing to do with how awesome our code is -- and everything to do with marketing.

We're all software developers here, so let me put this in terms programmers understand: Dungeons & Dragons character statistics. You know, the classics.

RPG character stats: STR DEX CON INT WIS CHA

If you're a programmer, and you want to get better at your job every year, you might think that the most important character stat to build is coding. Let's call this INT. So at the end of many years of toil, you'll end up something like this:

str6
dex9
con12
int51
wis13
chr4

OK, you're a genius programmer who can code circles around everyone else. But you may never ship any of your code for reasons that you don't control. That's an illusion. You can control when, how, and where your code ships. You probably spent too much time building your code and not enough time as an advocate of your code. Did you explain to people what your code does, why it's cool and important? Did you offer reasons why your code is going to make their lives better, at least in some small way? Did you make it easy for people to find and use your code?

I believe most programmers will be better served in their professional career if they shoot for character development more along these lines:

str16
dex14
con15
int18
wis16
chr17

Sometimes, you become a better programmer by choosing not to program. I agree with Steve: if I could teach my fellow software engineers one thing, it would be how to market themselves, their code, and their project.

Discussion