Coding Horror

programming and human factors

Lived Fast, Died Young, Left a Tired Corpse

It's easy to forget just how crazy things got during the Web 1.0 bubble in 2000. That was over ten years ago. For context, Mark Zuckerberg was all of sixteen when the original web bubble popped.

Web-bubble-graph-nasdaq

There's plenty of evidence that we're entering another tech bubble. It's just less visible to people outside the tech industry because there's no corresponding stock market IPO frenzy. Yet.

There are two films which captured the hyperbole and excess of the original dot com bubble especially well.

Startup-com-dvd-movie

The first is the documentary Startup.com. It's about the prototypical web 1.0 company: one predicated on an idea that made absolutely no sense, which proceeded to flame out in a spectacular and all too typical way for the era. This one just happened to occur on digital film. The govworks.com website described in the documentary, the one that burned through $60 million in 18 months, is now one of those ubiquitous domain squatter pages. A sign of the times, perhaps.

The second film was one I had always wanted to see, but wasn't able to until a few days ago: Code Rush. For a very long time, Code Rush was almost impossible to find, but the activism of Andy Baio nudged the director to make the film available under Creative Commons. You can now watch it online — and you absolutely should.

Remember when people charged money for a web browser? That was Netscape.

Code Rush is a PBS documentary recorded at Netscape from 1998 - 1999, focusing on the open sourcing of the Netscape code. As the documentary makes painfully clear, this wasn't an act of strategy so much as an act of desperation. That's what happens when the company behind the world's most ubiquitous operating system decides a web browser should be a standard part of the operating system.

Everyone in the documentary knows they're doomed; in fact, the phrase "we're doomed" is a common refrain throughout the film. But despite the gallows humor and the dark tone, parts of it are oddly inspiring. These are engineers who are working heroic, impossible schedules for a goal they're not sure they can achieve — or that they'll even survive as an organization long enough to even finish.

The most vivid difference between Startup.com and Code Rush is that Netscape, despite all their other mistakes and missteps, didn't just burn through millions of dollars for no discernable reason. They produced a meaningful legacy:

  • Through Netscape Navigator, the original popularization of HTML and the internet itself.
  • With the release of the Netscape source code on March 31st, 1998, the unlikely birth of the commercial open source movement.
  • Eventually producing the first credible threat to Internet Explorer in the form of Mozilla Firefox 1.0 in 2004.

Do you want money? Fame? Job security? Or do you want to change the world … eventually? Consider how many legendary hackers went on to brilliant careers from Netscape: Jamie Zawinski, Brendan Eich, Stuart Parmenter, Marc Andreeseen. The lessons of Netscape live on, even though the company doesn't. Code Rush is ultimately a meditation on the meaning of work as a programmer.

I'd like to think that when Facebook – the next Google and Microsoft rolled into one – goes public in early 2012, the markets will react rationally. More likely, people will all collectively lose their damn minds again and we'll be thrust into a newer, bigger, even more insane tech bubble than the first one.

Yes, you will have incredibly lucrative job offers in this bubble. That's the easy part. As Startup.com and Code Rush illustrate, the hard part is figuring out why you are working all those long hours. Consider carefully, lest the arc of your career mirror that of so many failed tech bubble companies: lived fast, died young, left a tired corpse.

Discussion

24 Gigabytes of Memory Ought to be Enough for Anybody

Are you familiar with this quote?

640K [of computer memory] ought to be enough for anybody. — Bill Gates

It's amusing, but Bill Gates never actually said that:

I've said some stupid things and some wrong things, but not that. No one involved in computers would ever say that a certain amount of memory is enough for all time … I keep bumping into that silly quotation attributed to me that says 640K of memory is enough. There's never a citation; the quotation just floats like a rumor, repeated again and again.

One of the few killer features of the otherwise unexciting Intel Core i7 platform upgrade* is the subtle fact that Core i7 chips use triple channel memory. That means three memory slots at a minimum, and in practice most Core i7 motherboards have six memory slots.

The price of DDR3 ram has declined to the point that populating all six slots of memory with 4 GB memory is, well, not cheap -- but quite attainable at $299 and declining.

24 gigabytes of DDR3 RAM

Twenty-four gigabytes of system memory for a mere $299! That's about $12.50 per gigabyte.

(And if you don't have a Core i7 system, they're not expensive to build, either. You can pair an inexpensive motherboard with even the slowest and cheapest triple channel compatible i7-950, which is plenty speedy – and overclocks well, if you're into that. Throw in the 24 GB of ram, and it all adds up to about $800 total. Don't forget the power supply and CPU cooler, though.)

Remember when one gigabyte of system memory was considered a lot? For context, our first "real" Stack Overflow database server had 24 GB of memory. Now I have that much in my desktop … just because I can. Well, that's not entirely true, as we do work with some sizable databases while building the Stack Exchange network.

24-gigabytes-ram-in-use

I guess having 24 gigabytes of system memory is a little extravagant, but at these prices -- why not? What's the harm in having obscene amounts of memory, making my system effectively future proof?

I have to say that in 1981, making those decisions, I felt like I was providing enough freedom for 10 years. That is, a move from 64k to 640k felt like something that would last a great deal of time. Well, it didn't – it took about only 6 years before people started to see that as a real problem. — Bill Gates

To me, it's more about no longer needing to think about memory as a scarce resource, something you allocate carefully and manage with great care. There's just .. lots. As Clay Shirky once related to me, via one of his college computer science professors:

Algorithms are for people who don't know how to buy RAM.

I mean, 24 GB of memory should be enough for anybody… right?

* it's only blah on the desktop; on the server the Nehalem architecture is indeed a monster and anyone running a server should upgrade to it, stat.

Discussion

Trouble In the House of Google

Let's look at where stackoverflow.com traffic came from for the year of 2010.

Stack-overflow-2010-traffic-by-source

When 88.2% of all traffic for your website comes from a single source, criticizing that single source feels … risky. And perhaps a bit churlish, like looking a gift horse in the mouth, or saying something derogatory in public about your Valued Business Partnertm.

Still, looking at the statistics, it's hard to avoid the obvious conclusion. I've been told many times that Google isn't a monopoly, but they apparently play one on the internet. You are perfectly free to switch to whichever non-viable alternative web search engine you want at any time. Just breathe in that sweet freedom, folks.

Sarcasm aside, I greatly admire Google. My goal is not to be acquired, because I'm in this thing for the long haul – but if I had to pick a company to be acquired by, it would probably be Google. I feel their emphasis on the information graph over the social graph aligns more closely with our mission than almost any other potential suitor I can think of. Anyway, we've been perfectly happy with Google as our de-facto traffic sugar daddy since the beginning. But last year, something strange happened: the content syndicators began to regularly outrank us in Google for our own content.

Syndicating our content is not a problem. In fact, it's encouraged. It would be deeply unfair of us to assert ownership over the content so generously contributed to our sites and create an underclass of digital sharecroppers. Anything posted to Stack Overflow, or any Stack Exchange Network site for that matter, is licensed back to the community in perpetuity under Creative Commons cc-by-sa. The community owns their contributions. We want the whole world to teach each other and learn from the questions and answers posted on our sites. Remix, reuse, share – and teach your peers! That's our mission. That's why I get up in the morning.

Jeff Atwood: Teaching peers is one of the best ways to develop mastery

However, implicit in this strategy was the assumption that we, as the canonical source for the original questions and answers, would always rank first. Consider Wikipedia – when was the last time you clicked through to a page that was nothing more than a legally copied, properly attributed Wikipedia entry encrusted in advertisements? Never, right? But it is in theory a completely valid, albeit dumb, business model. That's why Joel Spolsky and I were confident in sharing content back to the community with almost no reservations – because Google mercilessly penalizes sites that attempt to game the system by unfairly profiting on copied content. Remixing and reusing is fine, but mass-producing cheap copies encrusted with ads … isn't.

I think of this as common sense, but it's also spelled out explicitly in Google's webmaster content guidelines.

However, some webmasters attempt to improve their page's ranking and attract visitors by creating pages with many words but little or no authentic content. Google will take action against domains that try to rank more highly by just showing scraped or other auto-generated pages that don't add any value to users. Examples include:

Scraped content. Some webmasters make use of content taken from other, more reputable sites on the assumption that increasing the volume of web pages with random, irrelevant content is a good long-term strategy. Purely scraped content, even from high-quality sources, may not provide any added value to your users without additional useful services or content provided by your site. It's worthwhile to take the time to create original content that sets your site apart. This will keep your visitors coming back and will provide useful search results.

In 2010, our mailboxes suddenly started overflowing with complaints from users – complaints that they were doing perfectly reasonable Google searches, and ending up on scraper sites that mirrored Stack Overflow content with added advertisements. Even worse, in some cases, the original Stack Overflow question was nowhere to be found in the search results! That's particularly odd because our attribution terms require linking directly back to us, the canonical source for the question, without nofollow. Google, in indexing the scraped page, cannot avoid seeing that the scraped page links back to the canonical source. This culminated in, of all things, a special browser plug-in that redirects to Stack Overflow from the ripoff sites. How totally depressing. Joel and I thought this was impossible. And I felt like I had personally failed all of you.

The idea that there could be something wrong with Google was inconceivable to me. Google is gravity on the web, an omnipresent constant; blaming Google would be like blaming gravity for my own clumsiness. It wasn't even an option. I started with the golden rule: it's always my fault. We did a ton of due diligence on webmasters.stackexchange.com to ensure we weren't doing anything overtly stupid, and uber-mensch Matt Cutts went out of his way to investigate the hand-vetted search examples contributed in response to my tweet asking for search terms where the scrapers dominated. Issues were found on both sides, and changes were made. Success!

Despite the semi-positive resolution, I was disturbed. If these dime-store scrapers were doing so well and generating so much traffic on the back of our content – how was the rest of the web faring? My enduring faith in the gravitational constant of Google had been shaken. Shaken to the very core.

Throughout my investigation I had nagging doubts that we were seeing serious cracks in the algorithmic search foundations of the house that Google built. But I was afraid to write an article about it for fear I'd be claimed an incompetent kook. I wasn't comfortable sharing that opinion widely, because we might be doing something obviously wrong. Which we tend to do frequently and often. Gravity can't be wrong. We're just clumsy … right?

I can't help noticing that we're not the only site to have serious problems with Google search results in the last few months. In fact, the drum beat of deteriorating Google search quality has been practically deafening of late:

Anecdotally, my personal search results have also been noticeably worse lately. As part of Christmas shopping for my wife, I searched for "iPhone 4 case" in Google. I had to give up completely on the first two pages of search results as utterly useless, and searched Amazon instead.

People whose opinions I respect have all been echoing the same sentiment -- Google, the once essential tool, is somehow losing its edge. The spammers, scrapers, and SEO'ed-to-the-hilt content farms are winning.

Like any sane person, I'm rooting for Google in this battle, and I'd love nothing more than for Google to tweak a few algorithmic knobs and make this entire blog entry moot. Still, this is the first time since 2000 that I can recall Google search quality ever declining, and it has inspired some rather heretical thoughts in me -- are we seeing the first signs that algorithmic search has failed as a strategy? Is the next generation of search destined to be less algorithmic and more social?

It's a scary thing to even entertain, but maybe gravity really is broken.

Discussion

The Dirty Truth About Web Passwords

This weekend, the Gawker network was compromised.

This weekend we discovered that Gawker Media's servers were compromised, resulting in a security breach at Lifehacker, Gizmodo, Gawker, Jezebel, io9, Jalopnik, Kotaku, Deadspin, and Fleshbot. If you're a commenter on any of our sites, you probably have several questions.

It's no Black Sunday or iPod modem firmware hack, but it has release notes – and the story it tells is as epic as Beowulf:

So, here we are again with a monster release of ownage and data droppage. Previous attacks against the target were mocked, so we came along and raised the bar a little. How's this for "script kids"? Your empire has been compromised, your servers, your databases, online accounts and source code have all been ripped to shreds!

You wanted attention, well guess what, You've got it now!

Read those release notes. It'll explain how the compromise unfolded, blow by blow, from the inside.

Gawker is operated by Nick Denton, notorious for the unapologetic and often unethical "publish whatever it takes to get traffic" methods endorsed on his network. Do you remember the iPhone 4 leak? That was Gawker. Do you remember the article about bloggers being treated as virtual sweatshop workers? That was Gawker. Do you remember hearing about a blog lawsuit? That was probably Gawker, too.

Some might say having every account on your network compromised is exactly the kind of unwanted publicity attention that Gawker was founded on.

Personally, I'm more interested in how we can learn from this hack. Where did Gawker go wrong, and how can we avoid making those mistakes on our projects?

  1. Gawker saved passwords. You should never, ever store user passwords. If you do, you're storing passwords incorrectly. Always store the salted hash of the password – never the password itself! It's so easy, even members of Mensa er … can't … figure it out.

  2. Gawker used encryption incorrectly. The odd choice of archaic DES encryption meant that the passwords they saved were all truncated to 8 characters. No matter how long your password actually was, you only had to enter the first 8 characters for it to work. So much for choosing a secure pass phrase. Encryption is only as effective as the person using it. I'm not smart enough to use encryption, either, as you can see in Why Isn't My Encryption.. Encrypting?

  3. Gawker asked users to create a username and password on their site. The FAQ they posted about the breach has two interesting clarifications:
    2) What if I logged in using Facebook Connect? Was my password compromised?
    No. We never stored passwords of users who logged in using Facebook Connect.

    3) What if I linked my Twitter account with my Gawker Media account? Was my Twitter password compromised?
    No. We never stored Twitter passwords from users who linked their Twitter accounts with their Gawker Media account.

    That's right, people who used their internet driver's license to authenticate on these sites had no security problems at all! Does the need to post a comment on Gizmodo really justify polluting the world with yet another username and password? It's only the poor users who decided to entrust Gawker with a unique username and 'secure' password who got compromised.

(Beyond that, "don't be a jerk" is good advice to follow in business as well as your personal life. I find that you generally get back what you give. When your corporate mission is to succeed by exploiting every quasi-legal trick in the book, surely you can't be surprised when you get the same treatment in return.)

But honestly, as much as we can point and laugh at Gawker and blame them for this debacle, there is absolutely nothing unique or surprising about any of this. Regular readers of my blog are probably bored out of their minds by now because I just trotted out a whole bunch of blog posts I wrote 3 years ago. Again.

Here's the dirty truth about website passwords: the internet is full of websites exactly like the Gawker network. Let's say you have good old traditional username and passwords on 50 different websites. That's 50 different programmers who all have different ideas of how your password should be stored. I hope for your sake you used a different (and extremely secure) password on every single one of those websites. Because statistically speaking, you're screwed.

In other words, the more web sites you visit, the more networks you touch and trust with a username and password combination – the greater the odds that at least one of those networks will be compromised exactly like Gawker was, and give up your credentials for the world to see. At that point, unless you picked a strong, unique password on every single site you've ever visited, the situation gets ugly.

The bad news is that most users don't pick strong passwords. This has been proven time and time again, and the Gawker data is no different. Even worse, most users re-use these bad passwords across multiple websites. That's how this ugly Twitter worm suddenly appeared on the back of a bunch of compromised Gawker accounts.

Xkcd-excerpt

Now do you understand why I've been so aggressive about promoting the concept of the internet driver's license? That is, logging on to a web site using a set of third party credentials from a company you can actually trust to not be utterly incompetent at security? Sure, we're centralizing risk here to, say, Google, or Facebook – but I trust Google a heck of a lot more than I trust J. Random Website, and this really is no different in practice than having password recovery emails sent to your GMail account.

I'm not here to criticize Gawker. On the contrary, I'd like to thank them for illustrating in broad, bold relief the dirty truth about website passwords: we're all better off without them. If you'd like to see a future web free of Gawker style password compromises – stop trusting every random internet site with a unique username and password! Demand that they allow you to use your internet driver's license – that is, your existing Twitter, Facebook, Google, or OpenID credentials – to log into their website.

Discussion

My Holiday in Beautiful Panau

There is a high correlation between "programmer" and "gamer". One of the first Area 51 sites we launched, based on community demand, was gaming.stackexchange.com. Despite my fundamental skepticism about gaming as a Q&A topic -- as expressed on episode 87 of Herding Code -- I have to admit it has far exceeded my expectations.

But then maybe I shouldn't be so surprised. I've talked about the relationship between gamer and programmer before:

I used to recommend games on this very blog that I particularly enjoyed and felt were worthy of everyone's attention. I don't do this a lot any more, now that my blogging schedule has slipped to one post a week, if I'm lucky. (If you're wondering why, it's because running your own business is crazy stupid amounts of work when you turn it up to eleven.) Here are a few games I've recommended in the past:

I haven't had a ton of time to play games, other than the inevitable Rock Band 3, but I've been consumed by another game I had no idea would become so addictive -- Just Cause 2.

Just-cause-2

It's what you might call an open world sandbox game, in the vein of the Grand Theft Autos. But I could never get into the GTA games, even after trying GTA 3 and its sequels Vice City and San Andreas. They just left me cold, somehow.

Where GTA and its ilk often felt a tad too much like work for my tastes, Just Cause 2 is almost the opposite -- it is non-stop, full blown open world pandemonium from start to finish. One of the game's explicit goals is that you advance the plot by blowing stuff up. No, seriously. I'm not kidding. You have an entire 1000+ square kilometer island paradise at your disposal, filled with cities and military bases, spanning the range from snowy mountains to deserts to idyllic beaches -- all just waiting for you to turn them into "chaos points" … by any means necessary.

Just-cause-2-helicopter

Of course, you get around by hijacking whatever vehicles happen by, be they boats, airplanes, jumbo jets, cars, tanks, trucks, buses, monster trucks, motorcycles, scooters, tractors or anything in between. Even on foot it is fun to navigate the island of Panau, because the developers gave us an impossibly powerful personal zipline that you can fire at any object in the game to propel yourself toward it. Combine that with the magical parachute you can deploy anywhere, anytime, and they make for some fascinating diversions (parasailing anyone?). You can also use the zipline to attach any two objects together. Think about that for a second. Have you ever wondered what happens when you zipline a moving vehicle to a tree? Or a pedestrian? Or another vehicle? Hmm. As a result, simply going walkabout on the island is more fun than I ever would have imagined.

Between the 49 plot missions, 9 stronghold takeovers, 104 unique vehicles, the optional boat/plane/car/parachute race missions,the opportunities for insane stunt points, the umpteen zillion upgrade crates and faction objects to collect, and the 360+ locations in the 1000+ square kilometers of Panau -- there's always something interesting happening around every corner. And whatever it is, it's probably beautiful and blows up real good.

Just-cause-2-explosion

In short, Just Cause 2 is deliriously, stupidly, absurdly entertaining. I can't even remember the last game I completed where I felt compelled to go back after finishing the main storyline to discover even more areas I missed during my initial playthrough and get (most of) the in-game achievements. Whatever amount of time you have to play, Just Cause 2 will happily fill it with totally unscripted, awesome open world pandemonium.

Don't take my word for it; even the notoriously acidic game reviewer Yahtzee had almost nothing negative to say about Just Cause 2, which is his version of a positive review. And Metacritic gives Just Cause 2 a solid 84. Not that it can't be improved, of course; after such a sublime sandbox experience, I'm desperately curious to see what they'll add for Just Cause 3.

Luckily for you, the game has been out long enough that it can be picked up for a song on PS3, Xbox, or PC. Steam has Just Cause 2 on sale right now in an 8 player pack for $60, and Amazon has all versions in stock for under $30. Beware, though, as the PC version does require a pretty solid video card along with Windows Vista or newer -- but the upside is that I have mine cranked up to 2048x1152 with almost all the options on, and it rarely dips below 60 fps.

I spent my holidays on the beautiful island of Panau, and I don't regret a second of it. If you're looking for a vacation spot, I heartily recommend the open world sandbox of Panau. But while you're visiting, do be mindful of any errant gunfire, vehicles, and explosions.

Discussion