Coding Horror

programming and human factors

Groundhog Day, or, the Problem with A/B Testing

On a recent airplane flight, I happened to catch the movie Groundhog Day. Again.

Groundhog-day-movie-bill-murray

If you aren't familiar with this classic film, the premise is simple: Bill Murray, somehow, gets stuck reliving the same day over and over.

It's been at least 5 years since I've seen Groundhog Day. I don't know if it's my advanced age, or what, but it really struck me on this particular viewing: this is no comedy. There's a veneer of broad comedy, yes, but lurking just under that veneer is a deep, dark existential conundrum.

It might be amusing to relive the same day a few times, maybe even a few dozen times. But an entire year of the same day – an entire decade of the same day – everything happening in precisely, exactly the same way? My back of the envelope calculation easily ran to a decade. But I was wrong. The director, Harold Ramis thinks it was actually 30 or 40 years.

I think the 10-year estimate is too short. It takes at least 10 years to get good at anything, and alloting for the down time and misguided years [Phil] spent, it had to be more like 30 or 40 years [spent reliving the same day].

We only see bits and pieces of the full experience in the movie, but this time my mind began filling in the gaps. Repeating the same day for decades plays to our secret collective fear that our lives are irrelevant and ultimately pointless. None of our actions – even suicide, in endless grisly permutations – ever change anything. What's the point? Why bother? How many of us are trapped in here, and how can we escape?

This is some dark, scary stuff when you really think about it.

You want a prediction about the weather, you're asking the wrong Phil.

I'll give you a winter prediction.
It's gonna be cold,
it's gonna be gray,
and it's gonna last you for the rest of your life.

Comedy, my ass. I wanted to cry.

But there is a way out: redemption through repetition. If you have to watch Groundhog Day a few times to appreciate it, you're not alone. Indeed, that seems to be the whole point. Just ask Roger Ebert:

"Groundhog Day" is a film that finds its note and purpose so precisely that its genius may not be immediately noticeable. It unfolds so inevitably, is so entertaining, so apparently effortless, that you have to stand back and slap yourself before you see how good it really is.

Certainly I underrated it in my original review; I enjoyed it so easily that I was seduced into cheerful moderation. But there are a few films, and this is one of them, that burrow into our memories and become reference points. When you find yourself needing the phrase This is like "Groundhog Day" to explain how you feel, a movie has accomplished something.

There's something delightfully Ouroboros about the epiphanies and layered revelations in repeated viewings of a movie that is itself about (nearly) endless repetition.

Which, naturally, brings me to A/B testing. That's what Phil spends most of those thirty years doing. He spends it pursuing a woman, technically, but it's how he does it that is interesting:

Rita: This whole day has just been one long setup.

Phil: No it hasn't.

Rita: And I hate fudge! Yuck!

Phil: [making a mental list] No white chocolate. No fudge.

Rita: What are you doing? Are you making some kind of list or something? Did you call up my friends and ask what I like and what I don't like? Is this what love is for you?

Phil: No, this is real. This is love.

Rita: Stop saying that! You must be crazy.

Phil doesn't just go on one date with Rita, he goes on thousands of dates. During each date, he makes note of what she likes and responds to, and drops everything she doesn't. At the end he arrives at – quite literally – the perfect date. Everything that happens is the most ideal, most desirable version of all possible outcomes on that date on that particular day. Such are the luxuries afforded to a man repeating the same day forever.

This is the purest form of A/B testing imaginable. Given two choices, pick the one that "wins", and keep repeating this ad infinitum until you arrive at the ultimate, most scientifically desirable choice. Your marketing weasels would probably collapse in an ecstatic, religious fervor if they could achieve anything even remotely close to the level of perfect A/B testing depicted in Groundhog Day.

But at the end of this perfect date, something impossible happens: Rita rejects Phil.

Phil wasn't making these choices because he honestly believed in them. He was making these choices because he wanted a specific outcome – winning over Rita – and the experimental data told him which path he should take. Although the date was technically perfect, it didn't ring true to Rita, and that made all the difference.

That's the problem with A/B testing. It's empty. It has no feeling, no empathy, and at worst, it's dishonest. As my friend Nathan Bowers said:

A/B testing is like sandpaper. You can use it to smooth out details, but you can't actually create anything with it.

The next time you reach for A/B testing tools, remember what happened to Phil. You can achieve a shallow local maximum with A/B testing – but you'll never win hearts and minds. If you, or anyone on your team, is still having trouble figuring that out, well, the solution is simple: just watch Groundhog Day again.

Written by Jeff Atwood

Indoor enthusiast. Co-founder of Stack Exchange and Discourse. Disclaimer: I have no idea what I'm talking about. Find me here: http://twitter.com/codinghorror