Coding Horror

programming and human factors

The Incredible LinkTron 5000(tm)!

I talked in a previous post about Unbreakable Links-- that is, stating every URL in terms of a Google search rather than an absolute address. Great concept, but how do you determine which words on a web page are most likely to generate a unique search result? Well, wonder no more:

Behold the Incredible LinkTron5000 (tm)!

As you might imagine, this involves quite a bit of google abuse -- all of which is pre-cached for performance. Well, mostly pre-cached. If you have a page with a lot of words that I can't find in a dictionary, the LinkTron will take a little while to process it.

When researching this project, I found an invaluable source of information at Philipp Lenssen's Google Blogoscoped. For instance, this frequency distribution for the 26,000 most used words online. There's also a cool word frequency colorizer which visually depicts the "uniqueness" of a target URL.

Written by Jeff Atwood

Indoor enthusiast. Co-founder of Stack Overflow and Discourse. Disclaimer: I have no idea what I'm talking about. Find me here: https://infosec.exchange/@codinghorror