Whatever Happened to the META Tag?

Jeff Atwood

30 Nov 2005 — 3 min read — Comments

When was the last time you saw a HTML header like this?

<head>
<title>GUID World</title>
<meta name=“description”
content=“Everything you wanted to know about GUIDs but were afraid to ask”>
<meta name=“keywords”
content=“GUID, UUID, globally unique identifiers, 128-bit”>
</head>

The web is a metadata-free zone. It’s widely known that Google completely ignores metadata in its indexes. The <meta> tag has fallen so far out of favor that it drags the whole concept of metadata down with it. And perhaps rightfully so. Cory Doctorow viciously deconstructs metadata in Metacrap: Putting the torch to seven straw-men of the meta-utopia:

There are at least seven insurmountable obstacles between the world as we know it and meta-utopia. I’ll enumerate them below:.

1. People lie

Metadata exists in a competitive world. Suppliers compete to sell their goods, cranks compete to convey their crackpot theories (mea culpa), artists compete for audience. Attention-spans and wallets may not be zero-sum, but they’re damned close. That’s why:

A search for any commonly referenced term at a search-engine like Altavista will often turn up at least one porn link in the first ten results.

Your mailbox is full of spam with subject lines like “Re: The information you requested.”

Publisher’s Clearing House sends out advertisements that holler “You may already be a winner!”

Press-releases have gargantuan lists of empty buzzwords attached to them.

Meta-utopia is a world of reliable metadata. When poisoning the well confers benefits to the poisoners, the meta-waters get awfully toxic in short order.

The other six reasons are equally caustic, and all have a common theme: relying on users to create accurate metadata means you’re betting on an optimistic view of human behavior. And we all know how well that works out.

Which brings me to the complete abandonment of the <meta> tag. Isn’t it ironic that groups still advocate manually adding metadata to web pages? Who, exactly, is adding The Dublin Core Metadata Element Set to the <head> section of their web pages? Nobody, that’s who.

Manual metadata may be suspect, but automated generation of metadata is practically the holy grail. Google’s entire 450 zillion dollar market cap is predicated on one tiny, automatically generated piece of metadata on every web page they index: PageRank. Popularity rules the web. It’s high school all over again: either you’re popular and people link to you, or... well, good luck on that whole prom thing.

But popularity has some limitations. For one thing, PageRank doesn’t work on an intranet. Office documents are rarely HTML, rarely linked to each other, and you probably don’t have a large enough sample set to do any fancy statistical analysis, either. That’s why the Google Search Appliance not only actively indexes metadata in the <meta> tag, it requires metadata to return relevant results. It’s right in the manual. Just try doing that with the capital-g Google.

Perhaps that’s why Tim Bray steadfastly maintains that some form of metadata is necessary to improve search results.

One of the Web’s distinguishing features is that there’s a big gaping hole where the metadata ought to be. The Web has resources, identified by URI, and you can ask for “representations,” which come with some metadata, but the metadata is about the representation, not the resource. Given a URI, the Web has no built-in way to ask questions about it, for example “What is this about?” or “When does it expire?” or “Is this suitable for children?” or “Is this good?”

I’m not an advocate of the utopian semantic web, mind you, but I sure would like something that can tell the difference between a Jaguar and a Jaguar instead of telling me which one is more popular.

html metadata tags

HTML Validation: Does It Matter?

The web is, to put it charitably, a rather forgiving place. You can feed web browsers almost any sort of HTML markup or JavaScript code and they’ll gamely try to make sense of what you’ve provided, and render it the best they can. In comparison, most programming languages

You’re Doing It Wrong

In The Sad Tragedy of Micro-Optimization Theater we discussed the performance considerations of building a fragment of HTML. string s = @"<div class=""action-time"">{0}{1}</div> <div class=""gravatar32"">{2}</div> <div

Is HTML a Humane Markup Language?

One of the things we’re thinking about while building stackoverflow.com is how to let users style the questions and answers they’re entering on the site. Nothing’s decided at this point, but we definitely won’t be giving users one of those friendly-but-irritating HTML GUI browser layout

It’s a Malformed World

Bill de hra recently highlighted a little experiment Ian Hickson ran in August: I did a short study recently checking only for syntax errors in HTML documents, and the results were that of the 667416 files tested, 626575 had syntax errors. Over 93%. That’s only syntax errors in the