Coding Horror

programming and human factors

Disambiguating Search with Quasi-Evil Hierarchies

Let's say I was to search Google for the word Jaguar:

A Google search for the word 'jaguar'

There's an immediate problem. The semantics of Jaguar only exist in my head, not in any search box. Did I mean...

  • Jaguar the car?
  • OSX Jaguar?
  • Jaguar the animal?
  • The Atari Jaguar?
  • Austin Power's Shaguar?

Whichever it is, Google is displaying a lot of search results that are totally irrelevant to me. Sure, I could type in more words, but that's at odds with the Google philosophy of simplicity. A single word should get me what I want.

Now compare the same search for Jaguar on eBay:

an eBay search for the word 'jaguar'

Although I get the same poor results initially, I can indicate which kind of Jaguar I really meant with an additional click on the categories on the left side of the page. This immediately filters the search results to something relevant with almost no effort on my part.

Search dominates the web now, and for good reason. My apologies to Yet Another Hierarchically Organized Oracle and the Open Directory Project, but rigid hierarchy is evil. However, a rigid hierarchy is tremendously powerful as a semantic-narrowing filter on search results.

In a brave new Google world of "I'll just type in what I want and hit Enter" search, there may still be room for some quasi-evil hierarchy in there somewhere. For example, if Google is going to suggest "Did you mean.." corrections when I misspell a search term, why don't they do the same thing to disambiguate semantics?

proposed Google search semantics suggestions

Unlike the rigid, manual categorizations of eBay and DMOZ, you could probably automate this kind of semantic suggestion engine using Markov chain probabilities on existing web pages.

Written by Jeff Atwood

Indoor enthusiast. Co-founder of Stack Overflow and Discourse. Disclaimer: I have no idea what I'm talking about. Find me here: https://infosec.exchange/@codinghorror