Coding Horror (Page 132)

6 Mar 2007

Using Amazon S3 as an Image Hosting Service

In Reducing Your Website's Bandwidth Usage, I concluded that my best outsourced image hosting option was Amazon's S3 or Simple Storage Service.

S3 is a popular choice for startups. For example, SmugMug uses S3 as their primary data storage source. There have been a few minor S3-related bumps at SmugMug, but overall the prognosis appears to be good. After experimenting with S3 myself, I'm sold. The costs are reasonable:

No start up fees, no minimum charge
$0.15 per GB for each month of storage
$0.20 per GB of data transferred

It's not exactly unlimited bandwidth, but I was planning to spend $2 a month on image hosting anyway. That buys me 10 GB per month of highly reliable, pure file transfer bandwidth through S3. Beyond that, it's straight pay-as-you-go.

Unfortunately, Amazon doesn't provide a GUI or command-line application for easily transferring files to S3; it's only a set of SOAP and REST APIs.

There is Jungle Disk, which allows S3 to show up as a virtual drive on your computer, but Jungle Disk offers no way to make files accessible through public HTTP. And as I found out later, Jungle Disk also uses a strange, proprietary file naming and storage structure on S3 when you view it directly. Jungle Disk is a fine backup and offline storage tool (particularly considering how cheap S3 disk storage costs are), but it doesn't offer the level of control that I need.

Amazon does provide a number of API code samples in various languages, along with some links to tutorial articles, but beyond that, you're basically on your own. Or so I thought.

That was before I found the S3Fox Organizer for FireFox.

S3Fox is like a dream come true after futzing around with the S3 API. Using S3Fox, I can easily experiment with the S3 service to figure out how it works instead of spending my time struggling with arcane S3 API calls in a development environment. It's a shame Amazon doesn't offer a tool like this to help people understand what the S3 service is and how it works.

At any rate, my goal is to use S3 as an image hosting service. I started by uploading an entire folder of images with S3Fox. I had a few problems where S3Fox would mysteriously fail in the middle of a transfer, forcing me to exit all the way out of Firefox. Fortunately, S3Fox also has folder synchronization support, so I simply restarted the entire transfer and told it to skip all files that were already present in S3. After a few restarts, all the files were successfully uploaded. I then granted anonymous access to the entire folder and all of its contents. This effectively exposed all the uploaded images through the public S3 site URL:

http://s3.amazonaws.com/

All S3 content has to go in what Amazon calls a "Bucket". Bucket names must be globally unique, and you can only create a maximum of 100 Buckets per account. It's easy to see why when you form the next part of the URL:

http://s3.amazonaws.com/codinghorrorimg/

Each Bucket can hold an unlimited number of "Objects" of essentially unlimited size, with as much arbitrary key-value pair metadata as you want attached. Objects default to private access, but they have explicit access control lists (for Amazon accounts only), and you can make them public. Once we've added an Object, if we grant public read permission to it, we can then access it via the complete site / Bucket / Object URL:

http://s3.amazonaws.com/codinghorrorimg/codinghorror-bandwidth-usage.png

There's no concept of folders in S3. You can only emulate folder structures by adding Objects with tricky names like subfolder/myfile.txt. And you can't rename Buckets or Objects, as far as I can tell. But at least I can control the exact filenames, which I was unable to do with any other image hosting service.

In my testing I ended up uploading the entire contents of my /images folder twice. That cost me a whopping two cents according to my real-time S3 account statement:

Amazon s3 account statement

It's almost like micropayments in action.

S3 will probably end up costing me slightly more than the "Unlimited" $25/year accounts available on popular personal photo sharing sites. With S3, there's no illusion of unlimited bandwidth use unconstrained by cost. And personal photo and image sharing sites are often blocked by corporate networks, which makes sense if you consider their intended purpose: informally sharing personal photos. S3 is a more professional image hosting choice; it offers tighter control along with a full set of developer APIs.

Discussion

5 Mar 2007

Reducing Your Website's Bandwidth Usage

Over the last three years, this site has become far more popular than I ever could have imagined. Not that I'm complaining, mind you. Finding an audience and opening a dialog with that audience is the whole point of writing a blog in the first place.

But on the internet, popularity is a tax. Specifically, a bandwidth tax. When Why Can't Programmers.. Program? went viral last week, outgoing bandwidth usage spiked to nearly 9 gigabytes in a single day:

codinghorror bandwidth usage, 2/24/2007 - 3/4/2007

That was enough to completely saturate two T1 lines-- nearly 300 KB/sec-- for most of the day. And that includes the time we disabled access to the site entirely in order to keep it from taking out the whole network.* After that, it was clear that something had to be done. What can we do to reduce a website's bandwidth usage?

1. Switch to an external image provider.

Unless your website is an all-text affair, images will always consume the lion's share of your outgoing bandwidth. Even on this site, which is extremely minimalistic, the size of the images dwarf the size of the text. Consider my last blog post, which is fairly typical:

Size of post text	~4,900 bytes
Size of post image	~46,300 bytes
Size of site images	~4,600 bytes

The text only makes up about ten percent of the content for that post. To make a dent in our bandwidth problem, we must deal with the other ninety percent of the content-- the images-- first.

Ideally, we shouldn't have to serve up any images at all: we can outsource the hosting of our images to an external website. There are a number of free or nearly-free image sharing sites on the net which make this a viable strategy:

Imageshack
ImageShack offers free, unlimited storage, but has a 100 MB per hour bandwidth limit for each image. This sounds like a lot, but do the math: that's 1.66 MB per minute, or about 28 KB per second. And the larger your image is, the faster you'll burn through that meager allotment. But it's incredibly easy to use-- you don't even have to sign up-- and according to their common questions page, anything goes as long as it's not illegal.
Flickr
Flickr offers a free basic account with limited upload bandwidth and limited storage. Download bandwidth is unlimited. Upgrading to a paid Pro account for $25/year removes all upload and storage restrictions. However, Flickr's terms of use warn that "professional or corporate uses of Flickr are prohibited", and all external images require a link back to Flickr.
Photobucket
Photobucket's free account has a storage limit and a download bandwidth limit of 10 GB per month (that works out to a little over 14 MB per hour). Upgrading to a paid Pro account for $25/year removes the bandwidth limit. I couldn't find any relevant restrictions in their terms of service.
Amazon S3
Amazon's S3 service allows you to direct-link files at a cost of 15 cents per GB of storage, and 20 cents per GB transfer. It's unlikely that would add up to more than the ~ $2 / month that seems to be the going rate for the other unlimited bandwidth plans. It has worked well for at least one other site.

I like ImageShack a lot, but it's unsuitable for any kind of load, due to the hard-coded bandwidth limit. Photobucket offers the most favorable terms, but Flickr has a better, more mature toolset. Unfortunately, I didn't notice the terms of use restrictions at Flickr until I had already purchased a Pro account from them. So we'll see how it goes. Update: it looks like Amazon S3 may be the best long-term choice, as many (if not all) of these photo sharing services are blocked in corporate firewalls.

Even though this ends up costing me $25/year, it's still an incredible bargain. I am offloading 90% of my site's bandwidth usage to an external host for a measly 2 dollars a month.

And as a nice ancillary benefit, I no longer need to block image bandwidth theft with URL rewriting. Images are free and open to everyone, whether it's abuse or not. This makes life much easier for legitimate users who want to view my content in the reader of their choice.

Also, don't forget that favicon.ico is an image, too. It's retrieved more and more often by today's readers and browsers. Make favicon.ico as small as possible, because it can have a surprisingly large impact on your bandwidth.

2. Turn on HTTP compression.

Now that we've dealt with the image content, we can think about ways to save space on the remaining content-- the text. This one's a no-brainer. Enable HTTP compression on your webserver for roughly two-thirds reduction in text bandwidth. Let's use my last post as an example again:

Post size	63,826 bytes
Post size with compression	21,746 bytes

We get a 66% reduction in file size for every bit of text served up on our web site-- including all the JavaScript, HTML, and CSS-- by simply flipping a switch on our web server. The benefits of HTTP compression are so obvious it hurts. It's reasonably straightforward to set up in IIS 6.0 , and it's extremely easy to set up in Apache.

Never serve content that isn't HTTP compressed. It's as close as you'll ever get to free bandwidth in this world. If you aren't sure that HTTP compression is enabled on your website, use this handy web-based HTTP compression tester, and be sure.

3. Outsource Your RSS feeds.

Many web sites offer RSS feeds of updated content that users can subscribe to (or "syndicate") in RSS readers. Instead of visiting a website every day to see what has changed, RSS readers automatically pull down the latest RSS feed at regular intervals. Users are free to read your articles at their convenience, even offline. Sounds great, right?

It is great. Until your ealize just how much bandwidth all that RSS feed polling is consuming. It's staggering. Scott Hanselman told me that half his bandwidth was going to RSS feeds. And Rick Klau noted that 60% of his page views were RSS feed retrievals. The entire RSS ecosystem depends on properly coded RSS readers; a single badly-coded reader could pummel your feed, pulling uncompressed copies of your RSS feed down hourly-- even when it hasn't changed since the last retrieval. Now try to imagine thousands of poorly-coded RSS readers, all over the world. That's pretty much where we are today.

Serving up endless streams of RSS feeds is something I'd just as soon outsource. That's where FeedBurner comes in. Although I'll gladly outsource image hosting for the various images I use to complement my writing, I've been hesitant to hand control for something as critical as my RSS feed to a completely external service. I emailed Scott Hanselman, who switched his site over to FeedBurner a while ago, to solicit his thoughts. He was gracious enough to call me on the phone and address my concerns, even walking me through FeedBurner using his login.

I've switched my feed over to FeedBurner as of 3pm today. The switch should be transparent to any readers, since I used some ~~mod_rewrite~~ISAPIRewrite rules to do a seamless, automatic permanent redirect from the old feed URL to the new feed URL:

# do not redirect feedburner, but redirect everyone else
RewriteCond User-Agent: (?!FeedBurner).*
RewriteRule .*index.xml$|.*index.rdf$|.*atom.xml$
http://feeds.feedburner.com/codinghorror/ [I,RP,L]

And the best part is that immediately after I made this change, I noticed a huge drop in per-second and per-minute bandwidth on the server. I suppose that's not too surprising if you consider that the feedburner stats page for this feed are currently showing about one RSS feed hit per second. But even compressed, that's still about 31 KB of RSS feed per second that my server no longer has to deal with.

It's a substantial savings, and FeedBurner brings lots of other abilities to the table beyond mere bandwidth savings.

4. Optimize the size of your JavaScript and CSS

The only thing left for us to do now is reduce the size of our text content, with a special emphasis on the elements that are common to every page on our website. CSS and JavaScript resources are a good place to start, but the same techniques can apply to your HTML as well.

There's a handy online CSS compressor which offers three levels of CSS compression. I used it on the main CSS file for this page, with the following results:

original CSS size	2,299 bytes
after removing whitespace	1,758 bytes
after HTTP compression	615 bytes

We can do something similar to the JavaScript with this online JavaScript compressor, based on Douglas Crockford's JSMin. But before I put the JavaScript through the compressor, I went through and refactored it, using shorter variables and eliminating some redundant and obsolete code.

original JS size	1232 bytes
after refactoring	747 bytes
after removing whitespace	558 bytes
after HTTP compression	320 bytes

It's possible to use similar whitespace compressors on your HTML, but I don't recommend it. I only saw reductions in size of about 10%, which wasn't worth the hit to readability.

Realistically, whitespace and linefeed removal is doing work that the compression would be doing for us. We're just adding a dab of human-assisted efficiency:

	Raw	Compressed
Unoptimized CSS	2,299 bytes	671 bytes
Optimized CSS	1,758 bytes	615 bytes

It's only about a 10 percent savings once you factor in HTTP compression. The tradeoff is that CSS or JavaScript lacking whitespace and linefeeds has to be pasted into an editor to be effectively edited. I use Visual Studio 2005, which automatically "rehydrates" the code with proper whitespace and linefeeds when I issue the autoformat command.

Although this is definitely a micro-optimization, I think it's worthwhile since it reduces the payload of every single page on this website. But there's a reason it's the last item on the list, too. We're just cleaning up a few last opportunities to squeeze every last byte over the wire.

After implementing all these changes, I'm very happy with the results. I see a considerable improvement in bandwidth usage, and my page load times have never been snappier. But, these suggestions aren't a panacea. Even the most minimal, hyper-optimized compressed text content can saturate a 300 KB/sec link if the hits per second are coming fast enough. Still, I'm hoping these changes will let my site weather the next Digg storm with a little more dignity than it did the last one-- and avoid taking out the network in the process.

* the ironic thing about this is that the viral post in question was completely HTTP compressed text content anyway. So of all the suggestions above, only the RSS outsourcing would have helped.

Discussion

2 Mar 2007

Your Code: OOP or POO?

I'm not a fan of object orientation for the sake of object orientation. Often the proper OO way of doing things ends up being a productivity tax. Sure, objects are the backbone of any modern programming language, but sometimes I can't help feeling that slavish adherence to objects is making my life a lot more difficult. I've always found inheritance hierarchies to be brittle and unstable, and then there's the massive object-relational divide to contend with. OO seems to bring at least as many problems to the table as it solves.

Perhaps Paul Graham summarized it best:

Object-oriented programming generates a lot of what looks like work. Back in the days of fanfold, there was a type of programmer who would only put five or ten lines of code on a page, preceded by twenty lines of elaborately formatted comments. Object-oriented programming is like crack for these people: it lets you incorporate all this scaffolding right into your source code. Something that a Lisp hacker might handle by pushing a symbol onto a list becomes a whole file of classes and methods. So it is a good tool if you want to convince yourself, or someone else, that you are doing a lot of work.

Eric Lippert observed a similar occupational hazard among developers. It's something he calls object happiness.

What I sometimes see when I interview people and review code is symptoms of a disease I call Object Happiness. Object Happy people feel the need to apply principles of OO design to small, trivial, throwaway projects. They invest lots of unnecessary time making pure virtual abstract base classes -- writing programs where IFoos talk to IBars but there is only one implementation of each interface! I suspect that early exposure to OO design principles divorced from any practical context that motivates those principles leads to object happiness. People come away as OO True Believers rather than OO pragmatists.

I've seen so many problems caused by excessive, slavish adherence to OOP in production applications. Not that object oriented programming is inherently bad, mind you, but a little OOP goes a very long way. Adding objects to your code is like adding salt to a dish: use a little, and it's a savory seasoning; add too much and it utterly ruins the meal. Sometimes it's better to err on the side of simplicity, and I tend to favor the approach that results in less code, not more.

Given my ambivalence about all things OO, I was amused when Jon Galloway forwarded me a link to Patrick Smacchia's web page. Patrick is a French software developer. Evidently the acronym for object oriented programming is spelled a little differently in French than it is in English: POO.

S.S. Adams gag fake dog poo 'Doggonit'

That's exactly what I've imagined when I had to work on code that abused objects.

But POO code can have another, more constructive, meaning. This blog author argues that OOP pales in importance to POO. Programming fOr Others, that is.

The problem is that programmers are taught all about how to write OO code, and how doing so will improve the maintainability of their code. And by "taught", I don't just mean "taken a class or two". I mean: have pounded into head in school, spend years as a professional being mentored by senior OO "architects" and only then finally kind of understand how to use properly, some of the time. Most engineers wouldn't consider using a non-OO language, even if it had amazing features. The hype is that major.
So what, then, about all that code programmers write before their 10 years OO apprenticeship is complete? Is it just doomed to suck? Of course not, as long as they apply other techniques than OO. These techniques are out there but aren't as widely discussed.
The improvement [I propose] has little to do with any specific programming technique. It's more a matter of empathy; in this case, empathy for the programmer who might have to use your code. The author of this code actually thought through what kinds of mistakes another programmer might make, and strove to make the computer tell the programmer what they did wrong.
In my experience the best code, like the best user interfaces, seems to magically anticipate what you want or need to do next. Yet it's discussed infrequently relative to OO. Maybe what's missing is a buzzword. So let's make one up, Programming fOr Others, or POO for short.

The principles of object oriented programming are far more important than mindlessly, robotically instantiating objects everywhere:

Information hiding and encapsulation
Simplicity
Re-use
Maintainability and empathy

Stop worrying so much about the objects. Concentrate on satisfying the principles of object orientation rather than object-izing everything. And most of all, consider the poor sap who will have to read and support this code after you're done with it. That's why POO trumps OOP: programming as if people mattered will always be a more effective strategy than satisfying the architecture astronauts.

Discussion

1 Mar 2007

Curly's Law: Do One Thing

In Outliving the Great Variable Shortage, Tim Ottinger invokes Curly's Law:

A variable should mean one thing, and one thing only. It should not mean one thing in one circumstance, and carry a different value from a different domain some other time. It should not mean two things at once. It must not be both a floor polish and a dessert topping. It should mean One Thing, and should mean it all of the time.

The late, great Jack Palance played grizzled cowboy Curly Washburn in the 1991 comedy City Slickers. Curly's Law is defined in this bit of dialog from the movie:

Curly: Do you know what the secret of life is?
Curly: This. [holds up one finger]
Mitch: Your finger?
Curly: One thing. Just one thing. You stick to that and the rest don't mean shit.
Mitch: But what is the "one thing?"
Curly: [smiles] That's what you have to find out.

Curly's Law, Do One Thing, is reflected in several core principles of modern software development:

Don't Repeat Yourself
If you have more than one way to express the same thing, at some point the two or three different representations will most likely fall out of step with each other. Even if they don't, you're guaranteeing yourself the headache of maintaining them in parallel whenever a change occurs. And change will occur. Don't repeat yourself is important if you want flexible and maintainable software.
Once and Only Once
Each and every declaration of behavior should occur once, and only once. This is one of the main goals, if not the main goal, when refactoring code. The design goal is to eliminate duplicated declarations of behavior, typically by merging them or replacing multiple similar implementations with a unifying abstraction.
Single Point of Truth
Repetition leads to inconsistency and code that is subtly broken, because you changed only some repetitions when you needed to change all of them. Often, it also means that you haven't properly thought through the organization of your code. Any time you see duplicate code, that's a danger sign. Complexity is a cost; don't pay it twice.

Although Curly's Law definitely applies to normalization and removing redundancies, Do One Thing is more nuanced than the various restatements of Do Each Thing Once outlined above. It runs deeper. Bob Martin refers to it as The Single Responsibility Principle:

The Single Responsibility Principle says that a class should have one, and only one, reason to change. As an example, imagine the following class:
class Employee
{
public Money calculatePay()
public void save()
public String reportHours()
}
This class violates the SRP because it has three reasons to change:

The business rules having to do with calculating pay.
The database schema.
The format of the string that reports hours.

We don't want a single class to be impacted by these three completely different forces. We don't want to modify the Employee class every time the accounts decide to change the format of the hourly report, or every time the DBAs make a change to the database schema, as well as every time the managers change the payroll calculation. Rather, we want to separate these functions out into different classes so that they can change independently of each other.

Curly's Law is about choosing a single, clearly defined goal for any particular bit of code: Do One Thing. That much is clear. But in choosing one thing, you are ruling out an infinite universe of other possible things you could have done. Curly's Law also means consciously choosing what your code won't do. This is much more difficult than choosing what to do, because it runs counter to all the natural generalist tendencies of software developers. It could mean breaking code apart, violating traditional OOP rules, or introducing duplicate code. It's taking one step backward to go two steps forward.

Each variable, each line of code, each function, each class, each project should Do One Thing. Unfortunately, we usually don't find out what that one thing is until we've reached the end of it.

Discussion

28 Feb 2007

Choosing Anti-Anti-Virus Software

Now that Windows Vista has been available for almost a month, the comparative performance benchmarks are in.

Windows XP vs. Vista: The Benchmark Rundown (Tom's Hardware)
Windows Vista Performance Guide (Anandtech)

It's about what I expected; rough parity with the performance of Windows XP. Vista's a bit slower in some areas, and a bit faster in others. But shouldn't new operating systems perform better than old ones? There are plenty of low-level improvements under the hood. Why does Vista only break even in performance?

To be fair, Vista does a lot more than XP. I don't want to get into the whole XP vs. Vista argument here, but suffice it to say that the list of new features in Vista is quite extensive-- although perhaps not as extensive as some would like. Vista's integrated search alone is enough for me to banish XP from my life forever.

Microsoft has gotten a giant security shiner from Windows XP over the last five years. That's why Windows Vista goes out of its way to radically improve security, with new features like User Account Control (UAC) and Windows Defender. The existing security features in XP, such as Windows Firewall and System Protection (aka restore points) were significantly overhauled and improved for Vista, too. Enhanced security is a good thing, but it's never free. In fact, Vista's new security features will slow your PC down more than almost any other kind of software you can install.

For best performance, the first thing I do on any new Vista install is this:

Turn off Windows Defender
Turn off Windows Firewall
Disable System Protection
Disable UAC

I've had friends remark how "slow" Vista feels compared to XP, but when I ask them whether they've disabled Defender or UAC, the answer is typically no. Of course your system is going to be slower with all these added security checks. Security is expensive, and there ain't no such thing as a free lunch.

You might argue that three out of these four security features wouldn't even be necessary in the first place if Windows had originally followed the well-worn UNIX convention of separating standard users from privileged administrators. I won't disagree with you. But Windows' long historical precedent of setting user accounts up by default as privileged adminstrators is Microsoft's cross to bear. I can't rewrite history, and neither can Microsoft. That's why they came up with these painful, performance-sapping workarounds.

But this doesn't mean you have to give up on security entirely in the name of performance. If you're really serious about security, then create a new user account with non-Administrator privileges, and log in as that user. This isn't the default behavior in Vista, sadly. Post install, you get an Administrator-But-Not-Really-Just-Kidding account which triggers UAC on any action that requires administrator privileges. I'm sure this torturous hack was conceived in the name of backwards compatibility, but that doesn't mean we need to perpetuate it. The good news is that Vista is probably the first Microsoft operating system ever where you can actually work effectively as a standard, non-privileged user. As a standard user, you get all the benefits of UAC, Defender, and System Protection.. without all the performance drain.

Let me be clear here. I'm not against security. I'm against retrograde, band-aid, destroy all my computer's performance security.

Speaking of retrograde, band-aid, destroy all my computer's performance security, the one security feature Vista doesn't bundle is anti-virus software. And nothing cripples your PC's performance quite like anti-virus software. This isn't terribly surprising if you consider what anti-virus software has to do: examine every single byte of data that passes through your computer for evidence of malicious activity. But who needs theory when we have Oli at The PC Spy. Oli conducted a remarkably thorough investigation of the real world performance impact of security software on the PC. The results are truly eye-opening:

	Percent slower
	Boot	CPU	Disk
Norton Internet Security 2006	46%	20%	2369%
McAfee VirusScan Enterprise 8	7%	20%	2246%
Norton Internet Security 2007	45%	8%	1515%
Trend Micro PC-cillin AV 2006	2%	0%	1288%
ZoneAlarm ISS	16%	0%	992%
Norton Antivirus 2002	11%	8%	658%
Windows Live OneCare	11%	8%	512%
Webroot Spy Sweeper	6%	8%	369%
Nod32 v2.5	7%	8%	177%
avast! 4.7 Home	4%	8%	115%
Windows Defender	5%	8%	54%
Panda Antivirus 2007	20%	4%	15%
AVG 7.1 Free	15%	0%	19%

The worst offenders are the anti-virus suites with real-time protection. According to these results, the latest Norton Internet Security degrades boot time by nearly 50 percent. And no, that isn't a typo in the disk column. It also makes all disk access sixteen times slower! Even the better performers in this table would have a profoundly negative impact on your PC's performance. Windows Defender, for example, "only" makes hard drive access 54 percent slower.

And yet, despite the crushing performance penalty, anti-virus software is de rigeur in the PC world. Most PC vendors would no sooner ship a PC without preinstalled anti-virus software than they would ship a PC without an operating system (yeah, you wish). The very thought of running a PC naked, vulnerable, unprotected from viruses sends system administrators screaming from the room in a panic. When you tell a sysadmin you dislike running anti-virus software, they'll look at you mouth agape, as if you've just told them that you hate puppies and flowers.

I don't see why they're so shocked. anti-virus software itself, while not self-propagating like a virus, certainly fits the definition of a Trojan Horse. Once installed on your system, it has a hidden, unadvertised payload: it decimates your computer's performance and your productivity. In my opinion, what we really need is Anti-Anti-Virus software to keep us safe from the ongoing Anti-Virus software pandemic.

I've never run any anti-virus software. And Mac or Linux (aka UNIX) users almost never run anti-virus software, either. Am I irresponsible to run all my computers without anti-virus software? Are Mac and Linux users irresponsible for not participating in the culture of fear that Windows anti-virus software vendors propagate? I think it's braver and more responsible to recognize that anti-virus software vendors are not only telling us to be afraid, they are selling us fear. The entire anti-virus software industry is predicated on a bad architectural decision made by Microsoft fifteen years ago. And why, exactly, would any of these vendors want to solve the virus problem and put themselves out of business?

I'll certainly agree that you can't stop users from clicking on dancing bunnies if they have their mind set on it. You should have a few different security layers in any modern operating system. But we should also be treating the disease first -- too many damn users running as administrators-- instead of the symptoms.

As for remediation strategies, I'm a fan of the virtual machine future. We should treat our operating system like a roll of paper towels. If you get something on it you don't like, you ball it up and throw it away, and rip off a new, fresh one. But if that's too radical for you, I think Jan Goyvaerts is on to something with good old plain common sense backups:

In fact, with a proper backup system in place, you don't have to be afraid of messing up your system. I don't use any anti-virus or anti-spyware software. If my system starts acting up, I'll restore the backup, and have a guaranteed clean system. No spyware remover can beat that. If I want to play with beta software, I don't have to inconvenience myself by running it in a virtual machine. I do use VMware for testing my applications on clean installs of Windows. But when beta testing new versions of tools I use for development, I want to test them in my actual development environment rather. When the beta expires, I wipe it off by restoring the OS backup.

It's not terribly different from my virtual machine solution. Either way, you go back to a known good checkpoint. And I'll take a backup strategy over a computer with hobbled performance any day.

This also begs the question of what safety really means. No matter how much security software you install, nagging users with dozens of security dialogs clearly doesn't make users any safer. We should give users a basic level of protection as standard non-adminstrator users. But beyond that, let users make mistakes, and provide automatic, unlimited undo. That's the ultimate safety blanket.

Discussion