Coding Horror (Page 133)

27 Feb 2007

FizzBuzz: the Programmer's Stairway to Heaven

Evidently writing about the FizzBuzz problem on a programming blog results in a nigh-irresistible urge to code up a solution. The comments here, on Digg, and on Reddit – nearly a thousand in total – are filled with hastily coded solutions to FizzBuzz. Developers are nothing if not compulsive problem solvers.

It certainly wasn't my intention, but a large portion of the audience interpreted FizzBuzz as a challenge. I suppose it's like walking into Guitar Center and yelling 'most guitarists can't play Stairway to Heaven!'^*

You might be shooting for a rational discussion of Stairway to Heaven as a way to measure minimum levels of guitar competence. But what you'll get, instead, is a blazing guitarpocalypse.

I'm invoking the Wayne's World rule here: Please, No Stairway to Heaven.

FizzBuzz was presented as the lowest level of comprehension required to illustrate adequacy. There's no glory to be had in writing code that establishes a minimum level of competency. Even if you can write it in five different languages or in under 50 bytes of code.

The whole point of the original article was to think about why we have to ask people to write FizzBuzz. The mechanical part of writing and solving FizzBuzz, however cleverly, is irrelevant. Any programmer who cares enough to read programming blogs is already far beyond such a simple problem. FizzBuzz isn't meant for us. It's the ones we can't reach – the programmers who don't read anything – that we're forced to give the FizzBuzz test to.

Good software developers, even the ones who think they are Rockstars, don't play Stairway to Heaven. And instead of writing FizzBuzz code, they should be thinking about ways to prevent us from needing FizzBuzz code in the first place.

* via Jon Galloway and Steven Burch.

Discussion

26 Feb 2007

Why Can't Programmers.. Program?

I was incredulous when I read this observation from Reginald Braithwaite:

Like me, the author is having trouble with the fact that 199 out of 200 applicants for every programming job can't write code at all. I repeat: they can't write any code whatsoever.

The author he's referring to is Imran, who is evidently turning away lots of programmers who can't write a simple program:

After a fair bit of trial and error I've discovered that people who struggle to code don't just struggle on big problems, or even smallish problems (i.e. write a implementation of a linked list). They struggle with tiny problems.

So I set out to develop questions that can identify this kind of developer and came up with a class of questions I call "FizzBuzz Questions" named after a game children often play (or are made to play) in schools in the UK. An example of a Fizz-Buzz question is the following:

Write a program that prints the numbers from 1 to 100. But for multiples of three print "Fizz" instead of the number and for the multiples of five print "Buzz". For numbers which are multiples of both three and five print "FizzBuzz".

Most good programmers should be able to write out on paper a program which does this in a under a couple of minutes. Want to know something scary? The majority of comp sci graduates can't. I've also seen self-proclaimed senior programmers take more than 10-15 minutes to write a solution.

Dan Kegel had a similar experience hiring entry-level programmers:

A surprisingly large fraction of applicants, even those with masters' degrees and PhDs in computer science, fail during interviews when asked to carry out basic programming tasks. For example, I've personally interviewed graduates who can't answer "Write a loop that counts from 1 to 10" or "What's the number after F in hexadecimal?" Less trivially, I've interviewed many candidates who can't use recursion to solve a real problem. These are basic skills; anyone who lacks them probably hasn't done much programming.

Speaking on behalf of software engineers who have to interview prospective new hires, I can safely say that we're tired of talking to candidates who can't program their way out of a paper bag. If you can successfully write a loop that goes from 1 to 10 in every language on your resume, can do simple arithmetic without a calculator, and can use recursion to solve a real problem, you're already ahead of the pack!

Between Reginald, Dan, and Imran, I'm starting to get a little worried. I'm more than willing to cut freshly minted software developers slack at the beginning of their career. Everybody has to start somewhere. But I am disturbed and appalled that any so-called programmer would apply for a job without being able to write the simplest of programs. That's a slap in the face to anyone who writes software for a living.

The vast divide between those who can program and those who cannot program is well known. I assumed anyone applying for a job as a programmer had already crossed this chasm. Apparently this is not a reasonable assumption to make. Apparently, FizzBuzz style screening is required to keep interviewers from wasting their time interviewing programmers who can't program.

Lest you think the FizzBuzz test is too easy – and it is blindingly, intentionally easy – a commenter to Imran's post notes its efficacy:

I'd hate interviewers to dismiss [the FizzBuzz] test as being too easy - in my experience it is genuinely astonishing how many candidates are incapable of the simplest programming tasks.

Maybe it's foolish to begin interviewing a programmer without looking at their code first. At Vertigo, we require a code sample before we even proceed to the phone interview stage. And our on-site interview includes a small coding exercise. Nothing difficult, mind you, just a basic exercise to go through the motions of building a small application in an hour or so. Although there have been one or two notable flame-outs, for the most part, this strategy has worked well for us. It lets us focus on actual software engineering in the interview without resorting to tedious puzzle questions.

It's a shame you have to do so much pre-screening to have the luxury of interviewing programmers who can actually program. It'd be funny if it wasn't so damn depressing. I'm no fan of certification, but it does make me wonder if Steve McConnell was on to something with all his talk of creating a true profession of software engineering.

Due to high volume, comments for this entry are now closed.

Discussion

24 Feb 2007

You Want a 10,000 RPM Boot Drive

I don't go out of my way to recommend building your own computer. I do it, but I'm an OCD-addled, pain-loving masochist. You're usually better off buying whatever cut-rate OEM box Dell is hawking at the moment, particularly now that Intel has finally abandoned the awful Pentium 4 CPU series and is back in the saddle with its excellent Core Duo processor. PC parts are so good these days it's difficult to make a bad choice, no matter what you buy.

If you really must build your own computer, sites like Tech Report provide excellent advice in the form of their system guides. However, their guide sets the bar a little too low for my tastes. There are a few baseline requirements for any new computer build that aren't negotiable for me:

current dual core chip, such as the Core Duo 2 or Athlon 64 X2
minimum of 2 GB of memory
modern PCI express video card with 256mb or more of memory, such as the NVIDIA 7600GS, or the ATI Radeon X1650. Both of these cards can be found for about $100. Whatever you do, avoid on-board video, because it's universally crappy. The rule of thumb I use is this: if you're spending significantly less than $100 on your video card, you're making a terrible mistake.

It's not expensive. At today's prices, you're looking at around $800 for a new system based on these parts. Build that up and you've got a machine that can handle anything you throw at it, from cutting-edge games to full resolution high definition video playback. Oh yeah, and it compiles code pretty fast, too. If you're an avid gamer you might possibly want to throw another $50 to $100 at the video card for higher resolutions, but that's about it.

But one of the recommendations I make often gets some unexpected resistance. I believe every new PC build should have two hard drives:

small 10,000 RPM boot drive
large 7,200 RPM data/apps/games/media drive

I am a total convert to the Western Digital Raptor series of 10,000 RPM SATA hard drives. Maybe you're skeptical that a hard drive could make that much difference to a computer's performance. Well, I started out as a skeptic, too. But once I sat down and actually used a computer with a 10,000 RPM drive, my opinion did a complete about-face. I was blown away by how responsive and snappy it felt compared to my machine with a 7,200 RPM hard drive. It's a substantial difference that I continue to feel every day in typical use. Don't underestimate the impact of hard drive performance on your everyday use of the computer.

Western Digital Raptor Hard Drive

The difference in performance between a 7,200 RPM boot drive and a 10,000 RPM boot drive is not subtle in any way. But don't take my word for it. Surf the benchmarks yourself:

Unfortunately, the Raptors aren't large drives, and they're expensive on a per-megabyte basis. Current pricing is about $140 for the 74 GB model, and $180 for the 150 GB model. But once you factor in the incredible performance, and the idea that your don't need a lot of space on your primary drive because your secondary drive will be the large workhorse storage area, I think it's a completely reasonable tradeoff.

A number of people have expressed concerns that a 10,000 RPM drive will be run hot and noisy. I am a noise fanatic, and I can assure you that this is not the case. According to the StorageReview noise and heat analysis, the Raptor is squarely in the ballpark with its 7,200 RPM peers. I mount all my drives with sorbothane, and I use eggcrate foam on nearby surfaces to further reduce any reflected noise. Once I do this, the Raptor is no noisier than any other 3.5" desktop hard drive I've used.

Setting aside the performance argument for a moment, using two hard drives also provides additional flexibility. Although I cannot recommend RAID 0 on the desktop, there are clear benefits to using two standalone hard drives. You can isolate your essential user data from the operating system by storing it on the larger, secondary drive. This gives you the freedom to blow away your primary OS drive with relative impunity. It's also optimal for virtual machine use, as one drive can be dedicated to OS functions and the other can act exclusively as a virtual disk. There are plenty of usage scenarios where taking advantage of two hard drive spindles can provide a serious performance boost, such as extracting a large archive from one drive to another.

It's gotten to the point now where I won't even consider building a machine without a Raptor as the boot drive. Sure, your computer may have 2 or even 4 gigabytes of memory, but going to disk is inevitable. And every time you go to disk, you'll become thoroughly spoiled by the speed of the Raptor.

You may not know it yet, but you want a 10,000 RPM boot drive, too. In the words of Scott Hanselman: Go on. Treat yourself. I guarantee you won't be disappointed.

Discussion

23 Feb 2007

Revisiting 7-ZIP

In my previous post, I extolled the virtues of WinRAR and the RAR archive format. I disregarded 7-ZIP because it didn't do well in that particular compression study, and because my previous experiences with it had shown it to be efficient, but brutally slow.

But that's no longer true. Consider the following test I just conducted:

Two files: a 587 MB virtual hard disk file, and a 11 KB virtual machine file.
Test rig is a Dual Core Athlon X2 4800+.
All default GUI settings were used.
All extracting and archiving done from one physical hard drive to another, to reduce impact of disk contention.

	Extraction	Compression	Size
WinRAR 3.70 beta 2	0:39	3:09	135 MB
7-ZIP 4.20	-	6:04	127 MB
7-ZIP 4.44 beta	0:40	3:03	125 MB

7-ZIP performance has doubled over the last two years. And it's slightly more efficient at compression, too. That's impressive.

Performance is no longer a reason to choose WinRAR over 7-ZIP. Granted, this is a sample size of one, a single test on a single machine, but it's hard to ignore the dramatic reversal of fortune.

I still like WinRAR's ultra-slick shell integration. But 7-ZIP is a viable competitor now in terms of raw clock time performance, and as always, it tends to produce smaller archives than RAR. This more than addresses my previous criticisms. Mea culpa, 7-ZIP.

Discussion

22 Feb 2007

Don't Use ZIP, Use RAR

When I wrote Today is "Support Your Favorite Small Software Vendor Day", I made a commitment to spend at least $20 per month supporting my fellow independent software developers. WinRAR has become increasingly essential to my toolkit over the last year, so this month, I'm buying a WinRAR license.

Sure, ZIP support is built into most operating systems, but the support is rudimentary at best. I particularly dislike the limited "compressed folder wizard" I get by default in XP and Vista. In contrast, WinRAR is full-featured, powerful, and integrates seamlessly with the shell. There's a reason WinRAR won the best archive tool roundup at DonationCoder. And WinRAR is very much a living, breathing piece of software. It's frequently updated with neat little feature bumps and useful additions; two I noticed over the last year were dual-core support and real-time stats while compressing, such as estimated compression ratio and predicted completion time.

WinRAR fully supports creating and extracting ZIP archives, so choosing WinRAR doesn't mean you'll be forced into using the RAR compression format. But you should use it, because RAR, as a compression format, clobbers ZIP. It produces much smaller archives in roughly the same time. If you're worried the person on the receiving end of the archive won't have a RAR client, you can create a self-extracting executable archive (or SFX) at a minimal cost of about 60 KB additional filesize.

RAR also supports solid archives, so it can exploit intra-file redundancies. ZIP does not. This is a big deal, because it can result in a substantially smaller archive when you're compressing a lot of files. When I compressed all the C# code snippets, the difference was enormous:

ZIP	229 KB
RAR	73 KB

But even in an apples-to-apples comparison, RAR offers some of the very best "bang for the byte" of all compression algorithms. Consider this recent, comprehensive multiple file compression benchmark. The author measured both compression size and compression time to produce an efficiency metric:

The most efficient (read: useful) program is calculated by multiplying the compression time (in seconds) it took to produce the archive with the power of the archive size divided by the lowest measured archive size.
2 ^ (((Size/SmallestSize)) - 1) / 0.1) * ArchiveTime
The lower the score, the better. The basic idea is a compressor X has the same efficiency as compressor Y if X can compress twice as fast as Y and resulting archive size of X is 10% larger than size of Y.

And sure enough, if you sort the results by efficiency, WinRAR rises directly to the top. Its scores of 1871 (Good) and 1983 (Best) rank third and fourth out of 200. The top two spots are held by an archiver I've never heard of, SBC.

WinRAR and SBC 0.970 score very well on efficiency. Both SBC and WinRK are capable of compressing the 301 MB testset down to 82 MB [a ~73% compression ratio] in under 3 minutes. People looking for good (but not ultimate) and fast compression should have a look at those two programs.

The raw data on the comparison page is a little hard to parse, so I pulled the data into Excel and created some alternative views of it. Here's a graph of compression ratio versus time, sorted by compression ratio, for all compared archive programs:

Compression Time vs. Compression Ratio graph

What I wanted to illustrate with this graph is that beyond about 73% compression ratio, performance falls off a cliff. This is something I've noted before in previous compression studies. You don't just hit the point of diminishing returns in compression, you slam into it like a brick wall. That's why the time scale is logarithmic in the above graph. Look at the massive differences in time as you move toward the peak compression ratio:

72.58%	02:54	WinRAR 3.62
75.24%	11:20	UHARC 0.6b
77.16%	30:38	DRUILCA 0.5
78.83%	05:51:19	PAQ8H
79.70%	08:30:03	WinRK 3.0.3

Note that I cherry-picked the most efficient archivers out of this data, so this represents best case performance. Is an additional two percent of compression worth taking five times longer? Is an additional four percent worth ten times longer? Under the right conditions, possibly. But the penalty is severe, and the reward miniscule.

If you're interested in crunching the multiple file compression benchmark study data yourself, I converted it to a few different formats for your convenience:

Download Excel spreadsheet (36 KB)
Google Spreadsheet (view-only)
Google Spreadsheet (editable, but need Google login)

Personally, I recommend the Excel version. I had major performance problems with the Google spreadsheet version.

After poring over this data, I'm more convinced than ever. RAR offers a nearly perfect blend of compression efficiency and speed across all modern compression formats. And WinRAR is an exemplary GUI implementation of RAR. It's almost a no-brainer. Except in cases where backwards compatibility trumps all other concerns, we should abandon the archaic ZIP format-- and switch to the power and flexibility of WinRAR.

Discussion