Coding Horror

programming and human factors

Whatever Happened to Voice Recognition?

Remember that Scene in Star Trek IV where Scotty tried to use a Mac Plus?

Star-trek-4-apple-mac-plus

Using a mouse or keyboard to control a computer? Don't be silly. In the future, clearly there's only one way computers will be controlled: by speaking to them.

There's only one teeny-tiny problem with this magical future world of computers we control with our voices.

Voice-recognition-accuracy-rate-over-time

It doesn't work.

Despite ridiculous, order of magnitude increases in computing power over the last decade, we can't figure out how to get speech recognition accuracy above 80% -- when the baseline human voice transcription accuracy rate is anywhere from 96% to 98%!

In 2001 recognition accuracy topped out at 80%, far short of HAL-like levels of comprehension. Adding data or computing power made no difference. Researchers at Carnegie Mellon University checked again in 2006 and found the situation unchanged. With human discrimination as high as 98%, the unclosed gap left little basis for conversation. But sticking to a few topics, like numbers, helped. Saying “one” into the phone works about as well as pressing a button, approaching 100% accuracy. But loosen the vocabulary constraint and recognition begins to drift, turning to vertigo in the wide-open vastness of linguistic space.

As Robert Fortner explained in Rest in Peas: The Unrecognized Death of Speech Recognition, after all these years, we're desperately far away from any sort of universal speech recognition that's useful or practical.

Now, we do have to clarify that we're talking about universal recognition: saying anything to a computer, and having it reliably convert that into a valid, accurate text representation. When you constrain the voice input to a more limited vocabulary -- say, just numbers, or only the names that happen to be in your telephone's address book -- it's not unreasonable to expect a high level of accuracy. I tend to think of this as "voice control" rather than "voice recognition".

Still, I think we're avoiding the real question: is voice control, even hypothetically perfect voice control, more effective than the lower tech alternatives? In my experience, speech is one of the least effective, inefficient forms of communicating with other human beings. By that, I mean ...

  • typical spoken communication tends to be off-the-cuff and ad-hoc. Unless you're extremely disciplined, on average you will be unclear, rambling, and excessively verbose.
  • people tend to hear about half of what you say at any given time. If you're lucky.
  • spoken communication puts a highly disproportionate burden on the listener. Compare the time it takes to process a voicemail versus the time it takes to read an email.

I am by no means against talking with my fellow human beings. I have a very deep respect for those rare few who are great communicators in the challenging medium of conversational speech. Though we've all been trained literally from birth how to use our voices to communicate, voice communication remains filled with pitfalls and misunderstandings. Even in the best of conditions.

So why in the world -- outside of a disability -- would I want to extend the creaky, rickety old bridge of voice communication to controlling my computer? Isn't there a better way?

Robert's post contains some examples in the comments from voice control enthusiasts:

in addition to extremely accurate voice dictation, there are those really cool commands, like being able to say something like "search Google for Balloon Boy" or something like that and having it automatically open up your browser and enter the search term -- something like this is accomplished many times faster than a human could do it. Or, being able to total up a column of numbers in Microsoft Excel by saying simply "total this column" and seeing the results in a blink of an eye, literally.

That's funny, because I just fired up the Google app on my iPhone, said "balloon boy" into it, and got .. a search for "blue boy". I am not making this up. As for the Excel example, total which column? Let's assume you've dealt with the tricky problem of selecting what column you're talking about with only your voice. (I'm sorry, was it D5? B5?) Wouldn't it be many times faster to click the toolbar icon with your mouse, or press the keyboard command equivalent, to sum the column -- rather than methodically and tediously saying the words "sum this column" out loud?

I'm also trying to imagine a room full of people controlling their computers or phones using their voices. It's difficult enough to get work done in today's chatty work environments without the added burden of a floor full of people saying "zoom ... enhance" to their computers all day long. Wouldn't we all end up hoarse and deaf?

Let's look at another practical example -- YouTube's automatic speech recognition feature. I clicked through to the first UC Berkeley video with this feature, clicked the CC (closed caption) icon, and immediately got .. this.

Uc-berkeley-physics-lecture

"Light exerts force on matter". But according to Google's automatic speech recognition, it's "like the search for some matter". Unsurprisingly, it does not get better from there. You'd be way more confused than educated if you had to learn this lecture from the automatic transcription.

Back when Joel Spolsky and I had a podcast together, a helpful listener suggested using speech recognition to get a basic podcast transcript going. Everything I knew about voice recognition told me this wouldn't help, but harm. What's worse: transcribing everything by hand, from scratch -- or correcting every third or fourth word in an auto-generated machine transcript? Maybe it's just me, but the friction of the huge error rate inherent in the machine transcript seems far more intimidating than a blank slate human transcription. The humans may not be particularly efficient, but they all add value along the way -- collective human judgment can editorially improve the transcript, by removing all the duplication, repetition, and "ums" of a literal, by-the-book transcription.

In 2004, Mike Bliss composed a poem about voice recognition. He then read it to voice recognition software on his PC, and rewrote it as recognized.

a poem by Mike Bliss

like a baby, it listens
it can't discriminate
it tries to understand
it reflects what it thinks you say
it gets it wrong... sometimes
sometimes it gets it right.
One day it will grow up,
like a baby, it has potential
will it go to work?
will it turn to crime?
you look at it indulgently.
you can't help loving it, can you?
a poem by like myth

like a baby, it nuisance
it can't discriminate
it tries to oven
it reflects lot it things you say
it gets it run sometimes
sometimes it gets it right
won't day it will grow bop
Ninth a baby, it has provincial
will it both to look?
will it the two crime?
you move at it inevitably
you can't help loving it, cannot you?

The real punchline here is that Mike re-ran the experiment in 2008, and after 5 minutes of voice training, the voice recognition got all but 2 words of the original poem correct!

I suspect that's still not good enough in the face of the existing simpler alternatives. Remember handwriting recognition? It was all the rage in the era of the Apple Newton.

Doonesbury-newton

It wasn't as bad as Doonesbury made it out to be. I learned Palm's Graffiti handwriting recognition language and got fairly proficient with it. More than ten years later, you'd expect to see massively improved handwriting recognition of some sort in today's iPads and iPhones and iOthers, right? Well, maybe, if by "massively improved" you mean "nonexistent".

While it still surely has its niche uses, I personally don't miss handwriting recognition. Not even a little. And I can't help wondering if voice recognition will go the same way.

Discussion

The Vast and Endless Sea

After we created Stack Overflow, some people were convinced we had built a marginally better mousetrap for asking and answering questions. The inevitable speculation began: can we use your engine to build a Q&A site about {topic}? Our answer was Stack Exchange. Pay us $129 a month (and up), and you too can create a hosted Q&A community on our engine – for whatever topic you like!

Well, I have a confession to make: my heart was never in Stack Exchange. It was a parallel effort in a parallel universe only tangentially related to my own. There's a whole host of reasons why, but if I had to summarize it in a sentence, I'd say that money is poisonous to communities. That $129/month doesn't sound like much – and it isn't – but the commercial nature of the enterprise permeated and distorted everything from the get-go.

(fortunately, the model is changing with Stack Exchange 2.0, but that's a topic for another blog post.)

Yes, Stack Overflow Internet Services Incorporated©®™ is technically a business, even a venture capital backed business now – but I didn't co-found it because I wanted to make money. I co-founded it because I wanted to build something cool that made the internet better. Yes, selfishly for myself, of course, but also in conjunction with all of my fellow programmers, because I know none of us is as dumb as all of us.

Nobody is participating in Stack Overflow to make money. We're participating in Stack Overflow because …

  • We love programming
  • We want to leave breadcrumb trails for other programmers to follow so they can avoid making the same dumb mistakes we did
  • Teaching peers is one of the best ways to develop mastery
  • We can follow our own interests wherever they lead
  • We want to collectively build something great for the community with our tiny slices of effort

I don't care how much you pay me, you'll never be able to recreate the incredibly satisfying feeling I get when demonstrating mastery within my community of peers. That's what we do on Stack Overflow: have fun, while making the internet one infinitesimally tiny bit better every day.

So is it any wonder that some claim Stack Overflow is more satisfying than their real jobs? Not to me.

If this all seems like a bunch of communist hippie bullcrap to you, I understand. It's hard to explain. But there is quite a bit of science documenting these strange motivations. Let's start with Dan Pink's 2009 TED talk.

Dan's talk centers on the candle problem. Given the following three items …

  1. A candle
  2. A box of thumbtacks
  3. A book of matches

… how can you attach the candle to the wall?

It's not a very interesting problem on its own – that is, until you try to incentivize teams to solve it:

Now I want to tell you about an experiment using the candle problem by a scientist from Princeton named Sam Glucksberg. Here's what he did.

To the first group, he said, "I'm going to time you to establish norms, averages for how long it typically takes someone to solve this sort of problem."

To the second group, he said, "If you're in the top 25 percent of the fastest times you get five dollars. If you're the fastest of everyone we're testing here today you get 20 dollars." (This was many years ago. Adjusted for inflation, it's a decent sum of money for a few minutes of work.)

Question: How much faster did this group solve the problem?

Answer: It took them, on average, three and a half minutes longer. Three and a half minutes longer. Now this makes no sense, right? I mean, I'm an American. I believe in free markets. That's not how it's supposed to work. If you want people to perform better, you reward them. Give them bonuses, commissions, their own reality show. Incentivize them. That's how business works. But that's not happening here. You've got a monetary incentive designed to sharpen thinking and accelerate creativity – and it does just the opposite. It dulls thinking and blocks creativity.

It turns out that traditional carrot-and-stick incentives are only useful for repetitive, mechanical tasks. The minute you have to do anything even slightly complex that requires even a little problem solving without a clear solution or rules – those incentives not only don't work, they make things worse!

Pink eventually wrote a book about this, Drive: The Surprising Truth About What Motivates Us.

Drive by Daniel H. Pink

There's no need to read the book; this clever ten minute whiteboard animation will walk you through the main points. If you view only one video today, view this one.

The concept of intrinsic motivation may not be a new one, but I find that very few companies are brave enough to actually implement them.

I've tried mightily to live up to the ideals that Stack Overflow was founded on when building out my team. I don't care when you come to work or what your schedule is. I don't care where in the world you live (provided you have a great internet connection). I don't care how you do the work. I'm not going to micromanage you and assign you a queue of task items. There's no need.

If you want to build a ship, don't drum up the men to gather wood, divide the work and give orders. Instead, teach them to yearn for the vast and endless sea.

Antoine de Saint-Exupéry

Because I know you yearn for the vast and endless sea, just like we do.

Discussion

On Working Remotely

When I first chose my own adventure, I didn't know what working remotely from home was going to be like. I had never done it before. As programmers go, I'm fairly social. Which still means I'm a borderline sociopath by normal standards. All the same, I was worried that I'd go stir-crazy with no division between my work life and my home life.

Well, I haven't gone stir-crazy yet. I think. But in building Stack Overflow, I have learned a few things about what it means to work remotely – at least when it comes to programming. Our current team encompasses 5 people, distributed all over the USA, along with the team in NYC.

Usa-stack-overflow-team-map

My first mistake was attempting to program alone. I had weekly calls with my business partner, Joel Spolsky, which were quite productive in terms of figuring out what it was we were trying to do together – but he wasn't writing code. I was coding alone. Really alone. One guy working all by yourself alone. This didn't work at all for me. I was unmoored, directionless, suffering from analysis paralysis, and barely able to get motivated enough to write even a few lines of code. I rapidly realized that I'd made a huge mistake in not having a coding buddy to work with.

That situation rectified itself soon enough, as I was fortunate enough to find one of my favorite old coding buddies was available. Even though Jarrod was in North Carolina and I was in California, the shared source code was the mutual glue that stuck us together, motivated us, and kept us moving forward. To be fair, we also had the considerable advantage of prior history, because we had worked together at a previous job. But the minimum bar to work remotely is to find someone who loves code as much as you do. It's … enough. Anything else on top of that – old friendships, new friendships, a good working relationship – is icing that makes working together all the sweeter. I eventually expanded the team in the same way by adding another old coding buddy, Geoff, who lives in Oregon. And again by adding Kevin, who I didn't know, but had built amazing stuff for us without even being asked to, from Texas. And again by adding Robert, in Florida, who I also didn't know, but spent so much time on every single part of our sites that I felt he had been running alongside our team the whole way, there all along.

The reason remote development worked for us, in retrospect, wasn't just shared love of code. I picked developers who I knew – I had incontrovertible proof – were amazing programmers. I'm not saying they're perfect, far from it, merely that they were top programmers by any metric you'd care to measure. That's why they were able to work remotely. Newbie programmers, or competent programmers who are phoning it in, are absolutely not going to have the moxie necessary to get things done remotely – at least, not without a pointy haired manager, or grumpy old team lead, breathing down their neck. Don't even think about working remotely with anyone who doesn't freakin' bleed ones and zeros, and has a proven track record of getting things done.

While Joel certainly had a lot of high level input into what Stack Overflow eventually became, I only talked to him once a week, at best (these calls were the genesis of our weekly podcast series). I had a strong, clear vision of what I wanted Stack Overflow to be, and how I wanted it to work. Whenever there was a question about functionality or implementation, my team was able to rally around me and collectively make decisions we liked, and that I personally felt were in tune with this vision. And if you know me at all, you know I'm not shy about saying no, either. We were able to build exactly what we wanted, exactly how we wanted.

Bottom line, we were on a mission from God. And we still are.

So, there are a few basic ground rules for remote development, at least as I've seen it work:

  • The minimum remote team size is two. Always have a buddy, even if your buddy is on another continent halfway across the world.
  • Only grizzled veterans who absolutely love to code need apply for remote development positions. Mentoring of newbies or casual programmers simply doesn't work at all remotely.
  • To be effective, remote teams need full autonomy and a leader (PM, if you will) who has a strong vision and the power to fully execute on that vision.

This is all well and good when you have a remote team size of three, as we did for the bulk of Stack Overflow development. And all in the same country. Now we need to grow the company, and I'd like to grow it in distributed fashion, by hiring other amazing developers from around the world, many of whom I have met through Stack Overflow itself.
But how do you scale remote development? Joel had some deep seated concerns about this, so I tapped one of my heroes, Miguel de Icaza – who I'm proud to note is on our all-star board of advisors – and he was generous enough to give us some personal advice based on his experience running the Mono project, which has dozens of developers distributed all over the world.

Time-zone-differences

At the risk of summarizing mercilessly (and perhaps too much), I'll boil down Miguel's advice the best I can. There are three tools you'll need in place if you plan to grow a large-ish and still functional remote team:

  • Real time chat

    When your team member lives in Brazil, you can't exactly walk by his desk to ask him a quick question, or bug him about something in his recent checkin. Nope. You need a way to casually ping your fellow remote team members and get a response back quickly. This should be low friction and available to all remote developers at all times. IM, IRC, some web based tool, laser beams, smoke signals, carrier pigeon, two tin cans and a string: whatever. As long as everyone really uses it.

    We're currently experimenting with Campfire, but whatever floats your boat and you can get your team to consistently use, will work. Chat is the most essential and omnipresent form of communication you have when working remotely, so you need to make absolutely sure it's functioning before going any further.

  • Persistent mailing list

    Sure, your remote team may know the details of their project, but what about all the other work going on? How do they find out about that stuff or even know it exists in the first place? You need a virtual bulletin board: a place for announcements, weekly team reports, and meeting summaries. This is where a classic old-school mailing list comes in handy.

    We're using Google Groups and although it's old school in spades, it works plenty well for this. You can get the emails as they arrive, or view the archived list via the web interface. One word of caution, however. Every time you see something arrive in your inbox from the mailing list you better believe, in your heart of hearts, that it contains useful information. The minute the mailing list becomes just another "whenever I have time to read that stuff", noise engine, or distraction from work … you've let someone cry wolf too much, and ruined it. So be very careful. Noisy, argumentative, or useless things posted to the mailing list should be punishable by death. Or noogies.

  • Voice and video chat

    As much as I love ASCII, sometimes faceless ASCII characters just aren't enough to capture the full intentions and feelings of the human being behind them. When you find yourself sending kilobytes of ASCII back and forth, and still are unsatisfied that you're communicating, you should instill a reflexive habit of "going voice" on your team.

    Never underestimate the power of actually talking to another human being. I know, I know, the whole reason we got into this programming thing was to avoid talking to other people, but bear with me here. You can't be face to face on a remote team without flying 6 plus hours, and who the heck has that kind of time? I've got work I need to get done! Well, the next best thing to hopping on a plane is to fire up Skype and have a little voice chat. Easy peasy. All that human nuance which is totally lost in faceless ASCII characters (yes, even with our old pal *<:-)) will come roaring back if you regularly schedule voice chats. I recommend at least once a week at an absolute minimum; they don't have to be long meetings, but it sure helps in understanding the human being behind all those awesome checkins.

Nobody hates meetings and process claptrap more than I do, but there is a certain amount of process you'll need to keep a bunch of loosely connected remote teams and developers in sync.

  1. Monday team status reports

Every Monday, as in somebody's-got-a-case-of-the, each team should produce a brief, summarized rundown of:

  • What we did last week
  • What we're planning to do this week
  • Anything that is blocking us or we are concerned about

This doesn't have to be (and in fact shouldn't be) a long report. The briefer the better, but do try to capture all the useful highlights. Mail this to the mailing list every Monday like clockwork. Now, how many "teams" you have is up to you; I don't think this needs to be done at the individual developer level, but you could.

  • Meeting minutes

    Any time you conduct what you would consider to be a "meeting" with someone else, take minutes! That is, write down what happened in bullet point form, so those remote team members who couldn't be there can benefit from – or at least hear about – whatever happened.

    Again, this doesn't have to be long, and if you find taking meeting minutes onerous then you're probably doing it wrong. A simple bulleted list of sentences should suffice. We don't need to know every little detail, just the big picture stuff: who was there? What topics were discussed? What decisions were made? What are the next steps?

Both of the above should, of course, be mailed out to the mailing list as they are completed so everyone can be notified. You do have a mailing list, right? Of course you do!

If this seems like a lot of jibba-jabba, well, that's because remote development is hard. It takes discipline to make it all work, certainly more discipline than piling a bunch of programmers into the same cubicle farm. But when you imagine what this kind of intellectual work – not just programming, but anything where you're working in mostly thought-stuff – will be like in ten, twenty, even thirty years … don't you think it will look a lot like what happens every day right now on Stack Overflow? That is, a programmer in Brazil helping a programmer in New Jersey solve a problem?

If I have learned anything from Stack Overflow it is that the world of programming is truly global. I am honored to meet these brilliant programmers from every corner of the world, even if only in a small way through a website. Nothing is more exciting for me than the prospect of adding international members to the Stack Overflow team. The development of Stack Overflow should be reflective of what Stack Overflow is: an international effort of like-minded – and dare I say totally awesome – programmers. I wish I could hire each and every one of you. OK, maybe I'm a little biased. But to me, that's how awesome the Stack Overflow community is.

I believe remote development represents the future of work. If we have to spend a little time figuring out how this stuff works, and maybe even make some mistakes along the way, it's worth it. As far as I'm concerned, the future is now. Why wait?

Discussion

What's Wrong With CSS

We're currently in the midst of a CSS Zen Garden type excerise on our family of Q&A websites, which I affectionately refer to as "the Trilogy":

(In case you were wondering, yes, meta is the Star Wars Holiday Special.)

These sites all run the same core engine, but the logo, domain, and CSS "skin" that lies over the HTML skeleton is different in each case:

serverfault.com screenshot superuser.com screenshot
meta.stackoverflow screenshot stackoverflow.com screenshot

They are not terribly different looking, it's true, but we also want them to be recognizable as a family of sites.

We're working with two amazing designers, Jin Yang and Nathan Bowers, who are helping us whip the CSS and HTML into shape so they can produce a set of about 10 different Zen Garden designs. As new sites in our network get democracied into being, these designs will be used as a palette for the community to choose from. (And, later, the community will decide on a domain name and logo as well.)

Anyway, I bring this up not because my pokemans, let me show you them, but because I have to personally maintain four different CSS files. And that number is only going to get larger. Much larger. That scares me a little.

Most of all, what I've learned from this exercise in site theming is that CSS is kind of painful. I fully support CSS as a (mostly) functional user interface Model-View-Controller. But even if you have extreme HTML hygiene and Austrian levels of discipline, CSS has some serious limitations in practice.

Things in particular that bite us a lot:

  • Vertical alignment is a giant, hacky PITA. (Tables work great for this though!)
  • Lack of variables so we have to repeat colors all over the place.
  • Lack of nesting so we have to repeat huge blocks of CSS all over the place.

In short, CSS violates the living crap out of the DRY principle. You are constantly and unavoidably repeating yourself.

That's why I'm so intrigued by two Ruby gems that attempt to directly address the deficiencies of CSS.

1. Less CSS

/* CSS */
#header {
-moz-border-radius: 5;
-webkit-border-radius: 5;
border-radius: 5;
}
#footer {
-moz-border-radius: 10;
-webkit-border-radius: 10;
border-radius: 10;
}
// LessCSS
.rounded_corners (@radius: 5px) {
-moz-border-radius: @radius;
-webkit-border-radius: @radius;
border-radius: @radius;
}
#header {
.rounded_corners;
}
#footer {
.rounded_corners(10px);
}

2. SASS

/* CSS */
.content_navigation {
border-color: #3bbfce;
color: #2aaebd;
}
.border {
padding: 8px;
margin: 8px;
border-color: #3bbfce;
}
// Sass
!blue = #3bbfce
!margin = 16px
.content_navigation
border-color = !blue
color = !blue - #111
.border
padding = !margin / 2
margin = !margin / 2
border-color = !blue

As you can see, in both cases we're transmogrifying CSS into a bit more of a programming language, rather than the static set of layout rules it currently exists as. Behind the scenes, we're generating plain vanilla CSS using these little dynamic languages. This could be done at project build time, or even dynamically on every page load if you have a good caching strategy.

I'm not sure how many of these improvements CSS3 will bring, never mind when the bulk of browsers in the world will support it. But I definitely feel that the core changes identified in both Less CSS and SASS address very real pain points in practical CSS use. It's worth checking them out to understand why they exist, what they bring to the table, and how you could possibly adopt some of these strategies in your own CSS and your favorite programming language.

Discussion

So You'd Like to Send Some Email (Through Code)

I have what I would charitably describe as a hate-hate relationship with email. I desperately try to avoid sending email, not just for myself, but also in the code I write.

Despite my misgivings, email is the cockroach of communication mediums: you just can't kill it. Email is the one method of online contact that almost everyone -- at least for that subset of "everyone" which includes people who can bear to touch a computer at all -- is guaranteed to have, and use. Yes, you can make a fairly compelling case that email is for old stupid people, but let's table that discussion for now.

So, reluctantly, we come to the issue of sending email through code. It's easy! Let's send some email through oh, I don't know, let's say ... Ruby, courtesy of some sample code I found while browsing the Ruby tag on Stack Overflow.

require 'net/smtp'

def send_email(to, subject = "", body = "")
    from = "my@email.com"
    body= "From: #{from}\r\nTo: #{to}\r\nSubject: #{subject}\r\n\r\n#{body}\r\n"

    Net::SMTP.start('192.168.10.213', 25, '192.168.0.218') do |smtp|
        smtp.send_message body, from, to
    end
end

send_email "my@email.com", "test", "blah blah blah"

There's a bug in this code, though. Do you see it?

Just because you send an email doesn't mean it will arrive. Not by a long shot. Bear in mind this is email we're talking about. It was never designed to survive a bitter onslaught of criminals and spam, not to mention the explosive, exponential growth it has seen over the last twenty years. Email is a well that has been truly and thoroughly poisoned -- the digital equivalent of a superfund cleanup site. The ecosystem around email is a dank miasma of half-implemented, incompletely supported anti-spam hacks and workarounds.

Which means the odds of that random email your code just sent getting to its specific destination is .. spotty. At best.

If you want email your code sends to actually arrive in someone's AOL mailbox, to the dulcet tones of "You've Got Mail!", there are a few things you must do first. And most of them are only peripherally related to writing code.

1. Make sure the computer sending the email has a Reverse PTR record

What's a reverse PTR record? It's something your ISP has to configure for you -- a way of verifying that the email you send from a particular IP address actually belongs to the domain it is purportedly from.

Not every IP address has a corresponding PTR record. In fact, if you took a random sampling of addresses your firewall blocked because they were up to no good, you'd probably find most have no PTR record - a dig -x gets you no information. That's also apt to be true for mail spammers, or their PTR doesn't match up: if you do a dig -x on their IP you get a result, but if you look up that result you might not get the same IP you started with.

That's why PTR records have become important. Originally, PTR records were just intended as a convenience, and perhaps as a way to be neat and complete. There still are no requirements that you have a PTR record or that it be accurate, but because of the abuse of the internet by spammers, certain conventions have grown up. For example, you may not be able to send email to some sites if you don't have a valid PTR record, or if your pointer is "generic".

How do you get a PTR record? You might think that this is done by your domain registrar - after all, they point your domain to an IP address. Or you might think whoever handles your DNS would do this. But the PTR record isn't up to them, it's up to the ISP that "owns" the IP block it came from. They are the ones who need to create the PTR record.

A reverse PTR record is critical. How critical? Don't even bother reading any further until you've verified that your ISP has correctly configured the reverse PTR record for the server that will be sending email. It is absolutely the most common check done by mail servers these days. Fail the reverse PTR check, and I guarantee that a huge percentage of the emails you send will end up in the great bit bucket in the sky -- and not in the email inboxes you intended.

2. Configure DomainKeys Identified Mail in your DNS and code

What's DomainKeys Identified Mail? With DKIM, you "sign" every email you send with your private key, a key only you could possibly know. And this can be verified by attempting to decrypt the email using the public key stored in your public DNS records. It's really quite clever!

The first thing you need to do is generate some public-private key pairs (one for every domain you want to send email from) via OpenSSL. I used a win32 version I found. Issue these commands to produce the keys in the below files:

$ openssl genrsa -out rsa.private 1024
$ openssl rsa -in rsa.private -out rsa.public -pubout -outform PEM

These public and private keys are just big ol' Base64 encoded strings, so plop them in your code as configuration string resources that you can retrieve later.

Next, add some DNS records. You'll need two new TXT records.

  1. _domainkey.example.com
    "o=~; r=contact@example.com"
  2. selector._domainkey.example.com
    "k=rsa; p={public-key-base64-string-here}"

The first TXT DNS record is the global DomainKeys policy and contact email.

The second TXT DNS record is the public base64 key you generated earlier, as one giant unbroken string. Note that the "selector" part of this record can be anything you want; it's basically just a disambiguating string.

Almost done. One last thing -- we need to sign our emails before sending them. In any rational world this would be handled by an email library of some kind. We use Mailbee.NET which makes this fairly painless:

smtp.Message = dk.Sign(smtp.Message,
null, AppSettings.Email.DomainKeyPrivate, false, "selector");

3. Set up a SPF / SenderID record in your DNS

To be honest, SenderID is a bit of a "nice to have" compared to the above two. But if you've gone this far, you might as well go the distance. SenderID, while a little antiquated and kind of.. Microsoft/Hotmail centric.. doesn't take much additional effort.

SenderID isn't complicated. It's another TXT DNS record at the root of, say, example.com, which contains a specially formatted string documenting all the allowed IP addresses that mail can be expected to come from. Here's an example:

"v=spf1 a mx ip4:10.0.0.1 ip4:10.0.0.2 ~all"

You can use the Sender ID SPF Record Wizard to generate one of these for each domain you send email from.

That sucked. How do I know all this junk is working?

I agree, it sucked. Email sucks; what did you expect? I used two methods to verify that all the above was working:

  1. Test emails sent to a GMail account.

    Use the "show original" menu on the arriving email to see the raw message content as seen by the email server. You want to verify that the headers definitely contain the following:
    Received-SPF: pass
    Authentication-Results: ... spf=pass ... dkim=pass

    If you see that, then the Reverse PTR and DKIM signing you set up is working. Google provides excellent diagnostic feedback in their email server headers, so if something isn't working, you can usually discover enough of a hint there to figure out why.

  2. Test emails sent to the Port25 email verifier

    Port25 offers a really nifty public service -- you can send email to check-auth@verifier.port25.com and it will reply to the from: address with an extensive diagnostic! Here's an example summary result from a test email I just sent to it:
    SPF check:          pass
    DomainKeys check:   fail
    DKIM check:         pass
    Sender-ID check:    pass
    SpamAssassin check: ham
    

    You want to pass SPF, DKIM, and Sender-ID. Don't worry about the DomainKeys failure, as I believe it is spurious -- DKIM is the "newer" version of that same protocol.

Yes, the above three steps are quite a bit of work just to send a lousy email. But I don't send email lightly. By the time I've reached the point where I am forced to write code to send out email, I really, really want those damn emails to arrive. By any means necessary.

And for those who are the unfortunate recipients of these emails: my condolences.

Discussion