Coding Horror

programming and human factors

Your Internet Driver's License

Back in summer 2008 when we were building Stack Overflow, I chose OpenID logins for reasons documented in Does The World Really Need Yet Another Username and Password:

I realize that OpenID is far from an ideal solution. But right now, the one-login-per-website problem is so bad that I am willing to accept these tradeoffs for a partial worse is better solution. There's absolutely no way I'd put my banking credentials behind an OpenID. But there are also dozens of sites that I don't need anything remotely approaching banking-grade security for, and I use these sites far more often than my bank. The collective pain of remembering all these logins -- and the way my email inbox becomes a de-facto collecting point and security gateway for all of them -- is substantial.

It always pained me greatly that every rinky-dink website on the entire internet demanded that I create a special username and password just for them. Yes, if you're an alpha geek, then you probably use a combination of special software and USB key from your utility belt to generate secure usernames and passwords for the dozens of websites you frequent. But for the vast, silent majority of normals, who know nothing of security but desire convenience above all, this means one thing: using the same username and password over and over. And it's probably a simple password, too.

This is the status quo of identity on the internet. It is deeply and fundamentally broken.

But it doesn't have to be this way. If you open your wallet (or purse, or man-purse, or whatever), I bet you'll find a variety of credentials you use to prove your identity wherever you go.

Wallet-contents

The average wallet contains a few different forms of identity with varying strengths:

  • Strong: California driver's license, student ID
  • Moderate: credit cards, health insurance card, video rental membership, gym card
  • Weak: Albertson's Preferred Card, Best Buy Rewards Zone Card, Coffee loyalty card

(and sometimes even, uh, cards for free lapdances, apparently)

In the real world, we don't regularly hold two dozen forms of identity like we expect people to on the web. Not only would you be carrying around the freaking Constanza wallet at that point, it would be insane. In the real world, we somehow manage to get by with about two or three strong forms of identity, complemented by a few other weaker forms to taste.

I'm proposing that our web wallets begin to mimic our physical wallets. Whenever a website needs to know who I am, they should ask to see my Internet Driver's License.

Bigfoot-drivers-license

Now, I don't literally mean a driver's license. I'm using this term figuratively to mean online credentials that I can re-use in more than one place on the internet. If all I want to do is leave a comment on a blog -- like, say, this one -- then one of the weaker forms of identity will surely do. If I'm starting a new bank account, or setting up a profile on a dating website, then maybe a stronger credential from my virtual wallet is necessary.

The core concept that users need to get used to is logging in to a website by showing a third party credential to validate their identity. This idea isn't nearly as crazy as it seemed in 2008. How many websites can you log into by showing your Facebook, Google, or Twitter credentials now? Lots!

Disqus-login

The whole online identity situation may seem as impossible as peace in the Middle East at this point. But when faced with a problem that appears intractable, is your solution to throw your hands up, mindlessly embrace the status quo, and wearily sigh "whaddaya gonna do?"

Some people do that. It's their right. Personally, I prefer to be the change I want to see. So for us, on Stack Overflow and the Stack Exchange network, that means aggressively promoting the concept of the Internet Driver's License. Including educating users as necessary.

For example, consider this ATM machine. To use it, do I need to sign up for an account at Shanghai Peking Development Bank? No. I can use any form of trusted third-party credentials the machine supports.

Atm-machine

Similarly, to log into any Stack Exchange site, including Stack Overflow, present any OpenID or OAuth 2.0 compliant identity provider as your Internet Driver's License.

Atm-machine-stackoverflow

When we founded Stack Overflow, we set out with the explicit mission to make the internet better. Adding yet another meaningless username and password to the fabric of the web does not make it better. What does make the internet better is continued pursuit of better, simpler, re-usable forms of third party online identity.

That's why I urge you to join me in supporting OpenID, OAuth 2.0, and any other promising implementations of the Internet Driver's License.

Discussion

Breaking the Web's Cookie Jar

The Firefox add-in Firesheep caused quite an uproar a few weeks ago, and justifiably so. Here's how it works:

  • Connect to a public, unencrypted WiFi network. In other words, a WiFi network that doesn't require a password before you can connect to it.
  • Install Firefox and the Firesheep add-in.
  • Wait. Maybe have a latte while you're waiting.
  • Click on the user / website icons that appear over time in Firesheep to instantly log in as that user on that website.

Crazy! This guy who wrote Firesheep must be a world-class hacker, right?

Well, no. The work to package this up in a point-and-click way that is (sort of) accessible to power users is laudable, but what Firesheep actually does is far from magical. It's more of an art project and PR stunt than an actual hack of any kind. Still, I was oddly excited to see Firesheep get so much PR, because it highlights a fundamental issue with the architecture of the web.

The web is kind of a primitive medium. The only way websites know who you are is through tiny, uniquely identifiying strings your browser sends to the webserver on each and every click:

GET / HTTP/1.1
Host: diy.stackexchange.com
Connection: keep-alive
User-Agent: Chrome/7.0.517.44
Accept-Language: en-US,en;q=0.8
Cookie: diyuser=t=ZlQOG4kege&s=8VO9gjG7tU12s
If-Modified-Since: Tue, 09 Nov 2010 04:41:12 GMT

These are the typical sort of HTTP headers your browser sends to a website on every click. See that little cookie in bright red? To a website, that's your fingerprint, DNA, and social security number all rolled into one. Some part of the cookie contains a unique user ID that tells the website you are you.

And guess what? That cookie is always broadcast in plain text every single time you click a link on any website. Right out in the open where anyone -- well, technically, anyone who happens to be on the same network as you and is in a position to view your network packets -- can just grab it out of the ether and immediately impersonate you on any website you are a member of.

Broken-cookie

Now that you know how cookies work (and I'm not saying it's rocket surgery or anything), you also know that what Firesheep does is relatively straightforward:

  1. Listen to all HTTP traffic.
  2. Wait for HTTP headers from a known website.
  3. Isolate the part of the cookie header that identifies the user.
  4. Launch a new browser session with that cookie. Bam! As far as the target webserver is concerned, you are that user!

All Firesheep has to do, really, is listen. That's pretty much all there is to this "hack". Scary, right? Well, then you should be positively quaking in your boots, because this is the way the entire internet has worked since 1994, when cookies were invented.

So why wasn't this a problem in, say, 2003? Three reasons:

  1. Commodity public wireless internet connections were not exactly common until a few years ago.
  2. Average people have moved beyond mostly anonymous browsing and transferred significant parts of their identity online (aka the Facebook effect).
  3. The tools required to listen in on a wireless network are slightly … less primitive now.

Firesheep came along at the exact inflection point of these three trends. And mind you, it is still not a sure thing -- Firesheep requires a particular set of wireless network chipsets that support promiscuous mode in the lower level WinPcap library that Firesheep relies on. But we can bet that the floodgates have been opened, and future tools similar to this one will become increasingly a one-click affair.

The other reason this wasn't a problem in 2003 is because any website that truly needed security switched to encrypted HTTP -- aka Secure HTTP -- long ago. HTTPS was invented in 1994, at the same time as the browser cookie. This was not a coincidence. The creators of the cookie knew from day one they needed a way to protect them from prying eyes. Even way, way back in the dark, primitive ages of 2003, any banking website or identity website worth a damn wouldn't even consider using plain vanilla HTTP. They'd be laughed off the internet!

The outpouring of concern over Firesheep is justified, because, well, the web's cookie jar has always been kind of broken -- and we ought to do something about it. But what?

Yes, you can naively argue that every website should encrypt all their traffic all the time, but to me that's a "boil the sea" solution. I'd rather see a better, more secure identity protocol than ye olde HTTP cookies. I don't actually care if anyone sees the rest of my public activity on Stack Overflow; it's hardly a secret. But gee, I sure do care if they somehow sniff out my cookie and start running around doing stuff as me! Encrypting everything just to protect that one lousy cookie header seems like a whole lot of overkill to me.

I'm not holding my breath for that to happen any time soon, though. So here's what you can do to protect yourself, right now, today:

  1. We should be very careful how we browse on unencrypted wireless networks. This is the great gift of Firesheep to all of us. If nothing else, we should be thanking the author for this simple, stark warning. It's an unavoidable fact of life: if you must go wireless, seek out encrypted wireless networks. If you have no other choices except unencrypted wireless networks, browse anonymously -- quite possible if all you plan to do is casually surf the web and read a few articles -- and only log in to websites that support https. Anything else risks identity theft.
  2. Get in the habit of accessing your web mail through HTTPS. Email is the de-facto skeleton key to your online identity. When your email is compromised, all is lost. If your webmail provider does not support secure http, they are idiots. Drop them like a hot potato and immediately switch to one that does. Heck, the smart webmail providers already switched to https by default!
  3. Lobby the websites you use to offer HTTPS browsing. I think we're clearly past the point where only banks and finance sites should be expected to use secure HTTP. As more people shift more of their identities online, it makes sense to protect those identities by moving HTTPS from the domain of a massive bank vault door to just plain locking the door. SSL isn't as expensive as it used to be, in every dimension of the phrase, so this is not an unreasonable thing to ask your favorite website for.

This is very broad advice, and there are a whole host of technical caveats to the above. But it's a starting point toward evangelizing the risks and responsible use of open wireless networks. Firesheep may indeed have broken the web's cookie jar. But it was kind of an old, beat up, cracked cookie jar in the first place. I hope the powers that be will use Firesheep as incentive to build a better online identity solution than creaky old HTTP cookies.

Discussion

The Keyboard Cult

As a guy who spends most of his day typing words on a screen, it's hard for me to take touch computing seriously. I love my iPhone 4, and smartphones are the ultimate utility belt item, but attempting to compose any kind of text on the thing is absolutely crippling. It is a reasonable compromise for a device that fits in your pocket … but that's all.

The minute I switch back to my regular keyboard, I go from being Usain Bolt to the Flash.

Touchscreen-vs-keyboard

Touchscreens are great for passively browsing, as Scott Adams noted:

Another interesting phenomenon of the iPhone and iPad era is that we are being transformed from producers of content into consumers. With my BlackBerry, I probably created as much data as I consumed. It was easy to thumb-type long explanations, directions, and even jokes and observations. With my iPhone, I try to avoid creating any message that are over one sentence long. But I use the iPhone browser to consume information a hundred times more than I did with the BlackBerry. I wonder if this will change people over time, in some subtle way that isn't predictable. What happens when people become trained to think of information and entertainment as something they receive and not something they create?

Because we run an entire network of websites devoted to learning by typing words on a page, it's difficult for me to get past this.

But I'm not here to decry the evils of touchscreen typing. It has its place in the pantheon of computing. I'm here to sing the praises of the humble keyboard. The device that, when combined with the internet, turns every human being into a highly efficient global printing press.

My love affair with the keyboard goes way back:

Maybe I'm biased. As I recently remarked on programmers.stackexchange.com, I can't take slow typists seriously as programmers. When was the last time you saw a hunt-and-peck pianist?

I've been monogamous with the Microsoft Natural Keyboard 4000 for a long time. But in this supposedly happy marriage, I was accidentally neglecting one of the most crucial aspects of the keyboard experience.

The vast majority of keyboards included with white box systems or sold at office supply stores are rubber dome or membrane keyboards.

Membrane-remote-control

They are inexpensive, mass produced, relatively low quality devices that are inconsistent and degrade the user experience. Most users don't know this, or simply don't care. The appeal of cheap rubber dome or membrane keyboards is that they're usually available in a variety of styles, are included "free" with a new system, and they may sport additional features like media controls or wireless connectivity. But these cheap keyboards typically don't provide users with any tactile feedback, the keys feel mushy and may not all actuate at the same point, and the entire keyboard assemblies themselves tend to flex and move around when typed on. Not fun.

All this time, I've been typing on keyboards with least-common-denominator rubber dome innards. I was peripherally aware of higher quality mechanical keyboards, but I never appreciated them until I located this absolutely epic mechanical keyboard guide thread. It's also the source of an entire forum of people at geekhack.org who are mechanical keyboard enthusiasts. These kinds of communities and obsessions, writ so large and with such obvious passion, fascinate me. They are the inspiration for what we are trying to do with Stack Overflow and the Stack Exchange network.

If you don't have time to read that epic guide (but you should!), allow me to summarize:

  1. Almost all computer and laptop keyboards today use cheap, low quality switches -- rubber dome, membrane, scissor, or foam element.
  2. Mechanical switches are considered superior in every way by keyboard enthusiasts.
  3. Because the general public largely doesn't care about keyboard feel or durability, and because mechanical switches are more expensive, mechanical switch keyboards are quite rare these days.

Mechanical switches look, well, mechanical. They're spiritually the same as those old-school arcade buttons we used to mash on in the 1980s. You push down on the key, and the switch physically actuates.

Buckling-spring-switch

Yes, we are rapidly approaching the threshold of esoterica here. Mechanical keyboards were already becoming rare even before the internet, so I'd wager many people now reading this can't possibly know the difference between a typical cheap membrane keyboard and a fancy mechanical model because they've never had the opportunity to try one!

We should rectify that.

If you want to dip your fingers into the world of mechanical switch keyboards, start by asking yourself a few questions:

  • Are you willing to spend $70 to $300 for a keyboard?
  • How noisy do you want your typing to be?
  • Do you want a tactile "snap" when the key is depressed?
  • How much force do you type with -- do you have a light or heavy touch?
  • How much key travel do you want?
Next, there are further subtleties to consider, like how the keys are printed:

  • Pad Printed -- the standard cheap stuff. Little more than stickers. Keycaps will wear off fast.
  • Laser Etched -- permanent, but leaves tiny surface scars on the keys due to the characters being literally burned into the keys. May also be a tiny bit blurry.
  • Dye Sublimated -- dye set into plastic; expensive but nearly optimal.
  • Injection Molded -- two keys in different colors are physically bonded together. Very expensive but considered as close to perfect as you can get. Notably, NeXT keyboards used this method.

And what about the shape of the keycaps? Cylindrical? Spherical? Flat? And if you're an avid keyboard gamer, you might want to consider n-key rollover, too. I warned you this rabbit hole was deep.

Let's start looking at a few likely candidates. The one you may already know is Das Keyboard.

Das-keyboard

Das is a good, reliable brand of mechanical keyboards. They have two primary models. Each is available in the "blank keycaps" versions if you are the sort of ninja typist who doesn't need to look at the keyboard -- you type by chanelling the Force.

The "silent" mechanical switch distinction is an important one: mechanical switches can be loud. How loud? The DAS website actually sells honest-to-god earplugs as a keyboard accessory. I'm sure it's slightly tongue in cheek. Maybe. But consider yourself warned, and choose the silent model if you aren't a fan of the clickety-clack typing.

If you want the most old-school IBM-esque experience possible, and a true classic buckling spring keyboard, then Unicomp is your huckleberry. The common models are the Customizer 104/105 and SpaceSaver 104/105.

Unicomp-spacesaver

Next up is Elite Keyboards, but I can only recommend the (slightly expensive) Topre Realforce model due to the cheap pad keycap printing used on their other models.

Topre-realforce

Finally, Deck Keyboards -- I remember writing about these guys years ago. They have a full sized keyboard now with a lot of attention to detail: The Deck Legend.

Deck-legend-keyboard

It is also the only keyboard in its class that is backlit, if that's your bag.

Of course, none of these premium fancypants mechanical switch keyboards are really necessary. The most important aspect of writing isn't the keyboard you use, but the simple act of getting out there and writing as much as you can. But if, like me, you accidentally fall in love with the keyboard and everything it represents -- then I think you owe it to yourself to find out what a great keyboard is supposed to feel like.

Discussion

Because Everyone Needs a Router

Do you remember when a router used to be an exotic bit of network kit?

Those days are long gone. A router is one of those salt-of-the-earth items now; anyone who pays for an internet connection needs a router, for:

  1. NAT and basic hardware firewall protection from internet evildoers
  2. A wired network hub to connect local desktop PCs
  3. A wireless hub to connect laptops, phones, consoles, etcetera

Let me put it this way: my mom – and my wife's mom – both own routers. If that isn't the definition of mainstream, I don't know what is.

Since my livelihood revolves around being on the internet, and because I'm a bit of a tweaker, I have a fancy-ish router. But it is of late 2007 vintage:

Although the DGL-4500 is a nice router, and it has served me well with no real complaints, the last major firmware update for it was a year and a half ago. There have been some desultory minor updates since then, but clearly the vendor has, shall we say, moved on to focusing on newer models.

The router is (literally!) the central component in my overall internet experience, and I was growing increasingly uncomfortable with the status quo. Frankly, the prospect of three year old hardware with year old firmware gives me the heebie-jeebies.

So, I asked the pros at Super User, even going so far as to set up a Recommend Me a Router chat room. (We disallow product recommendation questions as they become uselessly out of date so quickly, but this is a perfect topic for a chat room.) I got some fantastic advice from my fellow Super Users via chat, though much of it was of the far too sane "if it ain't broke don't fix it" variety. Well, that's just not how I work. To be fair, the router market is not exactly a hotbed of excitement at the moment; it is both saturated and heavily commoditized, particularly now that the dust has settled from the whole 802.11 A/B/G/N debacle. There just isn't much going on.

But in the process of doing my router research, I discovered something important, and maybe even revolutionary in its own quiet little way. The best router models all run open source firmware!

That's right, the truly great routers are available in "awesome" edition. (There may be other open source router firmwares out there, but these are the two I saw most frequently.) I learned that these open source firmwares can turn a boring Clark Kent router into Superman. And they are always kept updated by the community, in perpetuity.

In my weaker moments, I toyed with the idea of building a silent mini x86 PC that could run a routing optimized distribution of Linux, but the reality is that current commodity routers have more than enough memory and embedded CPU power – not to mention the necessary wireless and gigabit ethernet hub bits already built in. Dedicating a whole x86 PC to routing is power inefficient, overly complex, and awkward.

Yes, today's router marketplace is commoditized and standardized and boring – but there are still a few clear hardware standouts. I turned to the experts at SmallNetBuilder for their in-depth technical reviews, and found two consensus recommendations:

Update: Though these models are still fine choices, particularly if you can find a great deal on them, I have newer recommendations in Because Everyone (Still) Needs a Router.

Buffalo Nfiniti Wireless-N High Power Router ($80)

Buffalo_wzr-hp-g300nh

NETGEAR WNDR3700 RangeMax Dual Band Wireless-N ($150)

Netgear-wndr3700-router

Both of these models got glowing reviews from the networking experts at SmallNetBuilder, and both are 100% compatible with the all-important open source dd-wrt firmware. You can't go wrong with either, but I chose the less expensive Buffalo Nfiniti router. Why?

  1. It's almost half the price, man!
  2. The "high power" part is verifiably and benchmarkably true, and I have some wireless range problems at my home.
  3. I do most of my heavy network lifting through wired gigabit ethernet, so I can't think of any reason I'd need the higher theoretical wireless throughput of the Netgear model.
  4. Although the Netgear has a 680 Mhz embedded CPU and 128mb RAM, the Buffalo's 400 MHz embedded CPU and 64mb of RAM is not exactly chopped liver, either; it's plenty for dd-wrt to work with. I'd almost go so far as to say the Netgear is a bit overkill… if you're into that sort of thing.

I received my Buffalo Nfiniti and immediately installed dd-wrt on it, which was very simple and accomplished through the existing web UI on the router. (Buffalo has a history of shipping rebranded dd-wrt distributions in their routers, so the out-of-box firmware is a kissing cousin.)

After rebooting, I was in love. The (more) modern gigabit hardware, CPU, and chipset was noticably snappier everywhere, even just dinking around in the admin web pages. And dd-wrt scratches every geek itch I have – putting that newer hardware to great use. Just check out the detailed stats I can get, including that pesky wireless signal strength problem. The top number is the Xbox 360 outside, the bottom number is my iPhone from about 10 feet away.

Dd-wrt-wireless-client-info

Worried your router is running low on embedded CPU grunt, or that 64 megabytes of memory is insufficient? Never fear; dd-wrt has you covered. Just check out the detailed, real time memory and cpu load stats.

Dd-wrt-memory-cpu-stats

Trying to figure out how much WAN/LAN/Wireless bandwidth you're using? How does a real time SVG graph, right from the router admin pages, grab you?

Dd-wrt-bandwidth-graph

It's just great all around. And I haven't even covered the proverbial laundry list of features that dd-wrt offers above and beyond most stock firmware! Suffice it to say that this is one of those times when the "let's support everything" penchant of open source projects works in our favor. Don't worry, it's all (mostly) disabled by default. Those features and tweaks can all safely be ignored; just know that they're available to you when and if you need them.

This is boring old plain vanilla commodity router hardware, but when combined with an open source firmware, it is a massive improvement over my three year old, proprietary high(ish) end router. The magic router formula these days is a combination of commodity hardware and open-source firmware. I'm so enamored of this one-two punch combo, in fact, I might even say it represents the future. Not just of the everyday workhorse routers we all need to access the internet – but the future of all commodity hardware.

Routers; we all need 'em, and they are crucial to our internet experience. Pick whichever router you like – as long as it's compatible with one of the open source firmware packages! Thanks to a wide variety of mature commodity hardware choices, plus infinitely and perpetually updated open source router firmware, I'm happy to report that now everyone can have a great router.

Discussion

YouTube vs. Fair Use

In YouTube: The Big Copyright Lie, I described my love-hate relationship with YouTube, at least as it existed in way back in the dark ages of 2007.

Now think back through all the videos you've watched on YouTube. How many of them contained any original content?

It's perhaps the ultimate case of cognitive dissonance: by YouTube's own rules [which prohibit copyrighted content], YouTube cannot exist. And yet it does.

How do we reconcile YouTube's official hard-line position on copyright with the reality that 90% of the content on their site is clearly copyrighted and clearly used without permission? It seems YouTube has an awfully convenient "don't ask, don't tell" policy-- they make no effort to verify that the uploaded content is either original content or fair use. The copyrighted content stays up until the copyright owner complains. Then, and only then, is it removed.

Today's lesson, then, is be careful what you ask for.

At the time, I just assumed that YouTube would never be able to resolve this problem through technology. The idea that you could somehow fingerprint every user-created uploaded video against every piece of copyrighted video ever created was so laughable to me that I wrote it off as impossible.

A few days ago I uploaded a small clip from the movie Better Off Dead to YouTube, in order to use it in the Go That Way, Really Fast blog entry. This is quintessential fair use: a tiny excerpt of the movie, presented in the context of a larger blog entry. So far, so good.

But then I uploaded a small clip from a different movie that I'm planning to use in another, future blog entry. Within an hour of uploading it, I received this email:

Dear {username},

Your video, {title}, may have content that is owned or licensed by {company}.

No action is required on your part; however, if you are interested in learning how this affects your video, please visit the Content ID Matches section of your account for more information.

Sincerely,

  • The YouTube Team

This 90 second clip is from a recent movie. Not a hugely popular movie, mind you, but a movie you've probably heard of. This email both fascinated and horrified me. How did they match a random, weirdly cropped (thanks, Windows Movie Maker) clip from the middle of a non-blockbuster movie within an hour of me uploading it? This had to be some kind of automated process that checks uploaded user content against every piece of copyrighted content ever created (or the top n subset thereof), exactly the kind that I thought was impossible.

Uh oh.

I began to do some research. I quickly found Fun with YouTube's Audio Content ID System, which doesn't cover video, but it's definitely related:

I was caught by surprise one day when I received an automated email from YouTube informing me that my video had a music rights issue and it was removed from the site. I didn't really care.

Then a car commercial parody I made (arguably one of my better videos) was taken down because I used an unlicensed song. That pissed me off. I couldn't easily go back and re-edit the video to remove the song, as the source media had long since been archived in a shoebox somewhere. And I couldn't simply re-upload the video, as it got identified and taken down every time. I needed to find a way to outsmart the fingerprinter. I was angry and I had a lot of free time. Not a good combination.

I racked my brain trying to think of every possible audio manipulation that might get by the fingerprinter. I came up with an almost-scientific method for testing each modification, and I got to work.

Further research led me to this brief TED talk, How YouTube Thinks About Copyright.

We compare each upload against all the reference files in our database. This heat map is going to show you how the brain of this system works.

Here we can see the reference file being compared to the user generated content. The system compares every moment of one to the other to see if there's a match. This means we can identify a match even if the copy uses just a portion of the original file, plays it in slow motion, and has degraded audio or video.

The scale and speed of this system is truly breathtaking – we're not just talking about a few videos, we're talking about over 100 years of video every day between new uploads and the legacy scans we regularly do across all of the content on the site. And when we compare those 100 years of video, we're comparing it against millions of reference files in our database. It'd be like 36,000 people staring at 36,000 monitors each and every day without as much as a coffee break.

I have to admit that I'm astounded by the scope, scale, and sheer effectiveness of YouTube's new copyright detection system that I thought was impossible! Seriously, watch the TED talk. It's not long. The more I researched YouTube's video identification tool, the more I realized that resistance is futile. It's so good that the only way to defeat it is by degrading your audio and video so much that you have effectively ruined it. And when it comes to copyright violations, if you can achieve mutually assured destruction, then you have won. Absolutely and unconditionally.

This is an outcome so incredible I am still having trouble believing it. But I have the automatically blocked uploads to prove it.

Now, I am in no way proposing that copyright is something we should be trying to defeat or work around. I suppose I was just used to the laissez faire status quo on YouTube, and the idea of a video copyright detection system this effective was completely beyond the pale. My hat is off to the engineers at Google who came up with this system. They aren't the bad guys here; they offer some rather sane alternatives when copyright matches are found:

If Content ID identifies a match between a user upload and material in the reference library, it applies the usage policy designated by the content owner. The usage policy tells the system what to do with the video. Matches can be to only the audio portion of an upload, the video portion only, or both.

There are three usage policies – Block, Track or Monetize. If a rights owner specifies a Block policy, the video will not be viewable on YouTube. If the rights owner specifies a Track policy, the video will continue to be made available on YouTube and the rights owner will receive information about the video, such as how many views it receives. For a Monetize policy, the video will continue to be available on YouTube and ads will appear in conjunction with the video. The policies can be region-specific, so a content owner can allow a particular piece of material in one country and block the material in another.

The particular content provider whose copyright I matched chose the draconian block policy. That's certainly not Google's fault, but I guess you could say I'm Feeling Unlucky.

Although the 90 second clip I uploaded is clearly copyrighted content – I would never dispute that – my intent is not to facilitate illegal use, but to "quote" the movie scene as part of a larger blog entry. YouTube does provide recourse for uploaders; they make it easy to file a dispute once the content is flagged as copyrighted. So I dutifully filled out the dispute form, indicating that I felt I had a reasonable claim of fair use.

Youtube-fair-use-dispute

Unfortunately, my fair use claim was denied without explanation by the copyright holder.

Let's consider the four guidelines for fair use I outlined in my original 2007 blog entry:

  1. Is the use transformative?
  2. Is the source material intended for the public good?
  3. How much was taken?
  4. What's the market effect?

While we're clear on 3 and 4, items 1 and 2 are hazy in a mashup. This would definitely be transformative, and I like to think that I'm writing for the erudition of myself and others, not merely to entertain people. I uploaded with the intent of the video being viewed through a blog entry, with YouTube as the content host only. But it was still 90 seconds of the movie viewable on YouTube by anyone, context free.

So I'm torn.

On one hand, this is an insanely impressive technological coup. The idea that YouTube can (with the assistance of the copyright holders) really validate every minute of uploaded video against every minute of every major copyrighted work is unfathomable to me. When YouTube promised to do this to placate copyright owners, I was sure they were delaying for time. But much to my fair-use-loving dismay, they've actually gone and built the damn thing – and it works.
Just, maybe, it works a little too well. I'm still looking for video sharing services that offer some kind of fair use protection.

Discussion