I talked in a previous post about Unbreakable Links-- that is, stating every URL in terms of a Google search rather than an absolute address. Great concept, but how do you determine which words on a web page are most likely to generate a unique search result? Well, wonder no more:
Behold the Incredible LinkTron5000 (tm)!
As you might imagine, this involves quite a bit of google abuse -- all of which is pre-cached for performance. Well, mostly pre-cached. If you have a page with a lot of words that I can't find in a dictionary, the LinkTron will take a little while to process it.
When researching this project, I found an invaluable source of information at Philipp Lenssen's Google Blogoscoped. For instance, this frequency distribution for the 26,000 most used words online. There's also a cool word frequency colorizer which visually depicts the "uniqueness" of a target URL.