URL Rewriting to Prevent Duplicate URLs

As a software developer, you may be familiar with the DRY principle: don't repeat yourself. It's absolute bedrock in software engineering, and it's covered beautifully in The Pragmatic Programmer, and even more succinctly in this brief IEEE software article (pdf). If you haven't committed this to heart by now, go read these links first. We'll wait.

Scott Hanselman recently found out the hard way that the DRY principle also applies to URLs. Consider the multiple ways you could get to this very page:

  • http://codinghorror.com/blog/
  • http://www.codinghorror.com/blog/
  • http://www.codinghorror.com/blog/index.htm

It's even more problematic for Scott because he has two different domain names that reference the same content.

Having multiple URLs reference the same content is undesirable not only from a sanity check DRY perspective, but also because it lowers your PageRank. PageRank is calculated per-URL. If 50% of your incoming backlinks use one URL, and 50% use a different URL, you aren't getting the full PageRank benefit of those backlinks. The link juice is watered down and divvied up between the two different URLs instead of being concentrated into one of them.

So the moral of this story, if there is one, is to keep your URLs simple and standard. This is something the REST crowd has been preaching for years. You can't knock simplicity. Well, you can, but you'll be crushed by simplicity's overwhelming popularity eventually, so why fight it?

Normalizing your URLs isn't difficult if you take advantage of URL Rewriting. URL Rewriting has been a de-facto standard on Apache for years, but has yet to reach mainstream acceptance in Microsoft's IIS. I'm not even sure if IIS 7 supports URL Rewriting out of the box, although its new, highly modular architecture would make it very easy to add support. It's critical that Microsoft get a good reference implementation of an IIS7 URL rewriter out there, preferably one that's compatible with the vast, existing library of mod_rewrite rules.

But that doesn't help us today. If you're using IIS today, you have two good options for URL rewriting; they're both installable as ISAPI filters. I'll show samples for both, using a few common URL rewriting rules that I personally use on my website.

The first is ISAPI Rewrite. ISAPI Rewrite isn't quite free, but it's reasonably priced, and most importantly, it's nearly identical in syntax to the Apache mod_rewrite standard. It's also quite mature, as it's been through quite a few revisions by now.

[ISAPI_Rewrite]
# fix missing slash on folders
# note, this assumes we have no folders with periods!
RewriteCond Host: (.*)
RewriteRule ([^.?]+[^.?/]) http://$1$2/ [RP]
# remove index pages from URLs
RewriteRule (.*)/default.htm$ $1/ [I,RP]
RewriteRule (.*)/default.aspx$ $1/ [I,RP]
RewriteRule (.*)/index.htm$ $1/ [I,RP]
RewriteRule (.*)/index.html$ $1/ [I,RP]
# force proper www. prefix on all requests
RewriteCond %HTTP_HOST ^test.com [I]
RewriteRule ^/(.*) http://www.test.com/$1 [RP]
# only allow whitelisted referers to hotlink images
RewriteCond Referer: (?!http://(?:www.good.com|www.better.com)).+
RewriteRule .*.(?:gif|jpg|jpeg|png) /images/block.jpg [I,O]

The second option, Ionic's ISAPI Rewrite Filter, is completely free. This filter has improved considerably since the last time I looked at it, and it appears to be a viable choice now. However, it uses its own rewrite syntax that is similar to the Apache mod_rewrite standard, but different enough to require some rework.

# fix missing slash on folders
# note, this assumes we have no folders with periods!
RewriteRule (^[^.]+[^/]$) $1/ [I,RP]
# remove index pages from URLs
RewriteRule  (.*)/default.htm$ $1/ [I,RP]
RewriteRule  (.*)/default.aspx$ $1/ [I,RP]
RewriteRule  (.*)/index.htm$ $1/ [I,RP]
RewriteRule  (.*)/index.html$ $1/ [I,RP]
# force proper www. prefix on all requests
RewriteCond %{HTTP_HOST} ^test.com [I]
RewriteRule ^/(.*) http://www.test.com/$1 [I,RP]
# only allow whitelisted referers to hotlink images
RewriteCond %{HTTP_REFERER} ^(?!HTTP_REFERER)
RewriteCond %{HTTP_REFERER} ^(?!http://www.good.com) [I]
RewriteCond %{HTTP_REFERER} ^(?!http://www.better.com) [I]
RewriteRule .(?:gif|jpg|jpeg|png)$ /images/block.jpg [I,L]

The Ionic filter still has some quirks, but I loved its default logging capability. I could tell exactly what was happening with my rules, blow by blow, with a quick glance at the log file. However, I had a lot of difficulty getting the Ionic filter to install-- I could only get it to work in IIS 5.0 isolation mode, no matter what I tried. Clearly a work in progress, but a very promising one.

Of course, the few rewrite rules I presented above-- URL normalization and image hotlink prevention-- are merely the tip of the iceberg.

They don't call it the Swiss Army Knife of URL Manipulation for nothing. URL rewriting should be an integral part of every web developer's toolkit. It'll increase your DRYness, it'll increase your PageRank, and it's also central to the concept of REST.

Related posts

The Problem With Code Folding

When you join a team, it's important to bend your preferences a little to accommodate the generally accepted coding practices of that team. Not everyone has to agree on every miniscule detail of the code, of course, but it's a good idea to dicuss it with

By Jeff Atwood ·
Comments

In Programming, One Is The Loneliest Number

Is software development an activity preferred by anti-social, misanthropic individuals who'd rather deal with computers than other people? If so, does it then follow that all software projects are best performed by a single person, working alone? The answer to the first question may be a reluctant yes,

By Jeff Atwood ·
Comments

Programmers and Chefs

From an audio interview with Ron Jeffries: The reason the kitchen is a mess is not because the kitchen is poorly designed, it's because we didn't do the dishes after every meal. Michael Feathers recently wrote an eerily similar entry about the professional chef's

By Jeff Atwood ·
Comments

Why Programmers File the Worst Bug Reports

Who files better bugs? Users or developers? In How to Report Bugs Effectively [http://www.chiark.greenend.org.uk/~sgtatham/bugs.html], Simon Tatham notes that software developers, contrary to what you might think, file some of the worst bug reports: > It isn't only non-programmers who produce

By Jeff Atwood ·
Comments

Recent Posts

Stay Gold, America

Stay Gold, America

We are at an unprecedented point in American history, and I'm concerned we may lose sight of the American Dream.

By Jeff Atwood ·
Comments
The Great Filter Comes For Us All

The Great Filter Comes For Us All

With a 13 billion year head start on evolution, why haven’t any other forms of life in the universe contacted us by now? (Arrival is a fantastic movie. Watch it, but don’t stop there – read the Story of Your Life novella it was based on for so much

By Jeff Atwood ·
Comments
I Fight For The Users

I Fight For The Users

If you haven’t been able to keep up with my blistering pace of one blog post per year, I don’t blame you. There’s a lot going on right now. It’s a busy time. But let’s pause and take a moment to celebrate that Elon Musk

By Jeff Atwood ·
Comments

The 2030 Self-Driving Car Bet

It’s my honor to announce that John Carmack and I have initiated a friendly bet of $10,000* to the 501(c)(3) charity of the winner’s choice: By January 1st, 2030, completely autonomous self-driving cars meeting SAE J3016 level 5 will be commercially available for passenger use

By Jeff Atwood ·
Comments