Coding Horror

programming and human factors

What does Stack Overflow want to be when it grows up?

I sometimes get asked by regular people in the actual real world what it is that I do for a living, and here's my 15 second answer:

We built a sort of Wikipedia website for computer programmers to post questions and answers. It's called Stack Overflow.

As of last month, it's been 10 years since Joel Spolsky and I started Stack Overflow. I currently do other stuff now, and I have since 2012, but if I will be known for anything when I'm dead, clearly it is going to be good old Stack Overflow.

Here's where I'd normally segue into a bunch of rah-rah stuff about how great Stack Overflow is, and thus how implicitly great I am by association for being a founder, and all.

bragging

I do not care about any of that.

What I do care about, though, is whether Stack Overflow is useful to working programmers. Let's check in with one of my idols, John Carmack. How useful is Stack Overflow, from the perspective of what I consider to be one of the greatest living programmers?

I won't lie, September 17th, 2013 was a pretty good day. I literally got chills when I read that, and not just because I always read the word "billions" in Carl Sagan's voice. It was also pleasantly the opposite of pretty much every other day I'm on Twitter, scrolling through an oppressive, endless litany of shared human suffering and people screaming at each other. Which reminds me, I should check my Twitter and see who else is wrong on the Internet today.

I am honored and humbled by the public utility that Stack Overflow has unlocked for a whole generation of programmers. But I didn't do that.

  • You did, when you contributed a well researched question to Stack Overflow.
  • You did, when you contributed a succinct and clear answer to Stack Overflow.
  • You did, when you edited a question or answer on Stack Overflow to make it better.

All those "fun size" units of Q&A collectively contributed by working programmers from all around the world ended up building a Creative Commons resource that truly rivals Wikipedia within our field. That's ... incredible, actually.

stack-overflow-homepage-oct-2018

But success stories are boring. The world is filled with people that basically got lucky, and subsequently can't stop telling people how it was all of their hard work and moxie that made it happen. I find failure much more instructive, and when building a business and planning for the future, I take on the role of Abyss Domain Expert™ and begin a staring contest. It's just a little something I like to do, you know ... for me.

abyss-oc

Thus, what I'd like to do right now is peer into that glorious abyss for a bit and introspect about the challenges I see facing Stack Overflow for the next 10 years. Before I begin, I do want to be absolutely crystal clear about a few things:

  1. I have not worked at Stack Overflow in any capacity whatsoever since February 2012 and I've had zero day to day operational input since that date, more or less by choice. Do I have opinions about how things should be done? Uh, have you met me? Do I email people every now and then about said opinions? I might, but I honestly do try to keep it to an absolute minimum, and I think my email archive track record here is reasonable.

  2. The people working at Stack are amazing and most of them (including much of the Stack Overflow community, while I'm at it) could articulate the mission better — and perhaps a tad less crankily — than I could by the time I left. Would I trust them with my life? No. But I'd trust them with Joel's life!

  3. The whole point of the Stack Overflow exercise is that it's not beholden to me, or Joel, or any other Great Person. Stack Overflow works because it empowers regular everyday programmers all over the world, just like you, just like me. I guess in my mind it's akin to being a parent. The goal is for your children to eventually grow up to be sane, practicing adults who don't need (or, really, want) you to hang around any more.

  4. Understand that you're reading the weak opinions strongly held the strong opinions weakly held of a co-founder who spent prodigious amounts of time working with the community in the first four years of Stack Overflow's life to shape the rules and norms of the site to fit their needs. These are merely my opinions. I like to think they are informed opinions, but that doesn't necessarily mean I can predict the future, or that I am even qualified to try. But I've never let being "qualified" stop me from doing anything, and I ain't about to start tonight.

Stack Overflow is a wiki first

Stack Overflow ultimately has much more in common with Wikipedia than a discussion forum. By this I mean questions and answers on Stack Overflow are not primarily judged by their usefulness to a specific individual, but by how many other programmers that question or answer can potentially help over time. I tried as hard as I could to emphasize this relationship from launch day in 2008. Note who has top billing in this venn diagram.

stack-overflow-venn-diagram

Stack Overflow later added a super neat feature to highlight this core value in user profiles, where it shows how many other people you have potentially helped with your contributed questions and answers so far.

stackoverflow-people-reached-profile-stat-1

The most common complaints I see about Stack Overflow are usually the result of this fundamental misunderstanding about who the questions and answers on the site are ultimately for, and why there's so much strictness involved in the whole process.

I'm continually amazed at the number of people, even on Hacker News today, who don't realize that every single question and answer is editable on Stack Overflow, even as a completely anonymous user who isn't logged in. Which makes sense, right, because Stack Overflow is a wiki, and that's how wikis work. Anyone can edit them. Go ahead, try it right now if you don't believe me — press the "improve this answer" or "improve this question" button on anything that can be improved, and make it so.

stack-overflow-edit-question

The responsibility for this misunderstanding is all on Stack Overflow (and by that I also mean myself, at least up until 2012). I guess the logic is that "every programmer has surely seen, used, and understands Stack Overflow by now, 10 years in" but ... I think that's a risky assumption. New programmers are minted every second of every day. Complicating matters further, there are three tiers of usage at Stack Overflow, from biggest to smallest, in inverted pyramid style:

  1. I passively search for programming answers.

    Passively searching and reading highly ranked Stack Overflow answers as they appear in web search results is arguably the primary goal of Stack Overflow. If Stack Overflow is working like it's supposed to, 98% of programmers should get all the answers they need from reading search result pages and wouldn't need to ask or answer a single question in their entire careers. This is a good thing! Great, even!

  2. I participate on Stack Overflow when I get stuck on a really hairy problem and searching isn't helping.

    Participating only at those times when you are extra stuck is completely valid. However, I feel this level is where most people tend to run into difficulty on Stack Overflow, because it involves someone who may not be new to Stack Overflow per se, but is new to asking questions, and also at the precise time of stress and tension for them where they must get an answer due to a problem they're facing … and they don't have the time or inclination to deal with Stack Overflow's strict wiki type requirements for research effort, formatting, showing previous work, and referencing what they found in prior searches.

  3. I participate on Stack Overflow for professional development.

    At this level you're talking about experienced Stack Overflow users who have contributed many answers and thus have a pretty good idea of what makes a great question, the kind they'd want to answer themselves. As a result, they don't tend to ask many questions because they self-medicate through exhaustive searching and research, but when they do ask one, their questions are exemplary.

(There's technically a fourth tier here, for people who want to selflessly contribute creative commons questions and answers to move the entire field of software development forward for the next generation of software developers. But who has time for saints 😇, y'all make the rest of us look bad, so knock it off already Skeet.)

It wouldn't shock me at all if people spent years happily at tier 1 and then got a big unpleasant surprise when reaching tier 2. The primary place to deal with this, in my opinion, is a massively revamped and improved ask page. It's also fair to note that maybe people don't understand that they're signing up for a sizable chunk of work by implicitly committing to the wiki standard of "try to make sure it's useful to more people than just yourself" when asking a question on Stack Overflow, and are then put off by the negative reaction to what others view as an insufficiently researched question.

Stack Overflow absorbs so much tension from its adoption of wiki standards for content. Even if you know about that requirement up front, it is not always clear what "useful" means, in the same way it's not always clear what topics, people, and places are deserving of a Wikipedia page. Henrietta Lacks, absolutely, but what about your cousin Dave in Omaha with his weirdo PHP 5.6 issue?

Over time, duplicates become vast landmine fields

Here's one thing I really, really saw coming and to be honest with you I was kinda glad I left in 2012 before I had to deal with it because of the incredible technical difficulty involved: duplicates. Of all the complaints I hear about Stack Overflow, this is the one I am most sympathetic to by far.

If you accept that Stack Overflow is a wiki type system, then for the same reasons that you obviously can't have five different articles about Italy on Wikipedia, Stack Overflow can't allow duplicate questions on the exact same programming problem. While there is a fair amount of code to do pre-emptive searches as people type in questions, plus many exhortations to search before you ask, with an inviting search field and button right there on the mandatory page you see before asking your first question ...

stack-overflow-how-to-ask

... locating and identifying duplicate content is an insanely difficult problem even for a company like Google that's done nothing but specialize in this exact problem for, what, 20 years now, with a veritable army of the world's most talented engineers.

When you're asking a question on a site that doesn't allow duplicate questions, the problem space of a site with 1 million existing questions is rather different from a site with 10 million existing questions ... or 100 million. Asking a single unique question goes from mildly difficult to mission almost impossible, because your question needs to thread a narrow path through this vast, enormous field of prior art questions without stepping on any of the vaguely similar looking landmines in the process.

stackoverflow-asking-duplicate-question

But wait! It gets harder!

  • Some variance in similar-ish questions is OK, because 10 different people will ask a nearly identical question using 10 different sets of completely unrelated words with no overlap. I know, it sounds crazy, but trust me: humans are amazing at this. We want all those duplicates to exist so they can point to the primary question they are a duplicate of, while still being valid search targets for people who ask questions with unusual or rare word choices.

  • It can be legitimately difficult to determine if your question is a true duplicate. How much overlap is enough before one programming question is a duplicate of another? And by whose definition? Opinions vary. This is subject to human interpretation, and humans are.. unreliable. Nobody will ever be completely happy with this system, pretty much by design. That tension is baked in permanently and forever.

I don't have any real answers on the duplicate problem, which only gets worse over time. But I will point out that there is plenty of precedent on the Stack Exchange network for splitting sites into "expert" and "beginner" areas with slightly different rulesets. We've seen this for Math vs. MathOverflow, English vs. English Learners, Unix vs. Ubuntu... perhaps it's time for a more beginner focused Stack Overflow where duplicates are less frowned upon, and conversational rules are a bit more lenient?

Stack Overflow is a competitive system of peer review

Stack Overflow was indeed built to be a fairly explicitly competitive system, with the caveat that "there's always more than one way to do it." This design choice was based on my perennial observation that the best way to motivate any programmer .. is to subtly insinuate that another programmer could have maybe done it better.

geek-hero-motivating-programmers

This is manifested in the public reputation system on Stack Overflow, the incredible power of a number printed next to someone's name, writ large. All reputation in Stack Overflow comes from the recognition of your peers, never the "system".

stack-overflow-top-rep-by-year

Once your question is asked, or your answer is posted, it can then be poked, prodded, edited, flagged, closed, opened, upvoted, downvoted, folded and spindled by your peers. The intent is for Stack Overflow to be a system of peer review and friendly competition, like a code review from a coworker you've never met at a different division of the company. It's also completely fair for a fellow programmer to question the premise of your question, as long as it's done in a nice way. For example, do you really want to use that regular expression to match HTML?

I fully acknowledge that competitive peer review systems aren't for everyone, and thus the overall process of having peers review your question may not always feel great, depending on your circumstances and background in the field — particularly when combined with the substantial tensions around utility and duplicates Stack Overflow already absorbed from its wiki elements. Kind of a double whammy there.

I've heard people describe the process of asking a question on Stack Overflow as anxiety inducing. To me, posting on Stack Overflow is supposed to involve a healthy kind of minor "let me be sure to show off my best work" anxiety:

  • the anxiety of giving a presentation to your fellow peers
  • the anxiety of doing well on a test
  • the anxiety of showing up to a new job with talented coworkers you admire
  • the anxiety of attending your first day at school with other students at your level

I imagine systems where there is zero anxiety involved and I can only think of jobs where I had long since stopped caring about the work and thus had no anxiety about whether I even showed for work on any given day. How can that be good? Let's just say I'm not a fan of zero-anxiety systems.

Maybe competition just isn't your jam. Could there be a less competitive Q&A system, a system without downvotes, a system without close votes, where there was never any anxiety about posting anything, just a network of super supportive folks who believe in you and want you to succeed no matter what? Absolutely! I think many alternative sites should exist on the internet so people can choose an experience that matches their personal preferences and goals. Should Stack build that alternative? Has it already been built? It's an open question; feel free to point out examples in the comments.

Stack Overflow is designed for practicing programmers

Another point of confusion that comes up a fair bit is who the intended audience for Stack Overflow actually is. That one is straightforward, and it's been the same from day one:

stackoverflow-for-business-description

Q&A for professional and enthusiast programmers. By that we mean

People who either already have a job as a programmer, or could potentially be hired as a programmer today if they wanted to be.

Yes, in case you're wondering, part of this was an overt business decision. To make money you must have an audience of people already on a programmer's salary, or in the job hunt to be a programmer. The entire Stack Overflow network may be Creative Commons licensed, but it was never a non-profit play. It was planned as a sustainable business from the outset, and that's why we launched Stack Overflow Careers only one year after Stack Overflow itself ... to be honest far sooner than we should have, in retrospect. Careers has since been smartly subsumed into Stack Overflow proper at stackoverflow.com/jobs for a more integrated and most assuredly way-better-than-2009 experience.

The choice of audience wasn't meant to be an exclusionary decision in any way, but Stack Overflow was definitely designed as a fairly strict system of peer review, which is great (IMNSHO, obviously) for already practicing professionals, but pretty much everything you would not want as a student or beginner. This is why I cringe so hard I practically turn myself inside out when people on Twitter mention that they have pointed their students at Stack Overflow. What you'd want for a beginner or a student in the field of programming is almost the exact opposite of what Stack Overflow does at every turn:

  • one on one mentoring
  • real time collaborative screen sharing
  • live chat
  • theory and background courses
  • starter tasks and exercises
  • playgrounds to experiment in

These are all very fine and good things, but Stack Overflow does NONE of them, by design.

Can you use Stack Overflow to learn how to program from first principles? Well, technically you can do anything with any software. You could try to have actual conversations on Reddit, if you're a masochist. But the answer is yes. You could learn how to program on Stack Overflow, in theory, if you are a prodigy who is comfortable with the light competitive aspects (reputation, closing, downvoting) and also perfectly willing to define all your contributions to the site in terms of utility to others, not just yourself as a student attempting to learn things. But I suuuuuuper would not recommend it. There are far better websites and systems out there for learning to be a programmer. Could Stack Overflow build beginner and student friendly systems like this? I don't know, and it's certainly not my call to make. 🤔

And that's it. We can now resume our normal non-abyss gazing. Or whatever it is that passes for normal in these times.

I hope all of this doesn't come across as negative. Overall I'd say the state of the Stack is strong. But does it even matter what I think? As it was in 2008, so it is in 2018.

Stack Overflow is you.

This is the scary part, the great leap of faith that Stack Overflow is predicated on: trusting your fellow programmers. The programmers who choose to participate in Stack Overflow are the “secret sauce” that makes it work. You are the reason I continue to believe in developer community as the greatest source of learning and growth. You are the reason I continue to get so many positive emails and testimonials about Stack Overflow. I can’t take credit for that. But you can.

I learned the collective power of my fellow programmers long ago writing on Coding Horror. The community is far, far smarter than I will ever be. All I can ask — all any of us can ask — is to help each other along the path.

And if your fellow programmers decide to recognize you for that, then I say you’ve well and truly earned it.

The strength of Stack Overflow begins, and ends, with the community of programmers that power the site. What should Stack Overflow be when it grows up? Whatever we make it, together.

stackoverflow-none-of-us-is-as-dumb-as-all-of-us

p.s. Happy 10th anniversary Stack Overflow!


Also see Joel's take on 10 years of Stack Overflow with The Stack Overflow Age, A Dusting of Gamification, and Strange and Maddening Rules.

Discussion

There is no longer any such thing as Computer Security

Remember "cybersecurity"?

its-cybersecurity-yay

Mysterious hooded computer guys doing mysterious hooded computer guy .. things! Who knows what kind of naughty digital mischief they might be up to?

Unfortunately, we now live in a world where this kind of digital mischief is literally rewriting the world's history. For proof of that, you need look no further than this single email that was sent March 19th, 2016.

podesta-hack-email-text

If you don't recognize what this is, it is a phishing email.

phishing-guy

This is by now a very, very famous phishing email, arguably the most famous of all time. But let's consider how this email even got sent to its target in the first place:

  • An attacker slurped up lists of any public emails of 2008 political campaign staffers.

  • One 2008 staffer was also hired for the 2016 political campaign

  • That particular staffer had non-public campaign emails in their address book, and one of them was a powerful key campaign member with an extensive email history.

On successful phish leads to an even wider address book attack net down the line. Once they gain access to a person's inbox, they use it to prepare to their next attack. They'll harvest existing email addresses, subject lines, content, and attachments to construct plausible looking boobytrapped emails and mail them to all of their contacts. How sophisticated and targeted to a particular person this effort is determines whether it's so-called "spear" phishing or not.

phishing-vs-spear-phishing

In this case is it was not at all targeted. This is a remarkably unsophisticated, absolutely generic routine phishing attack. There is zero focused attack effort on display here. But note the target did not immediately click the link in the email!

podesta-hack-email-link-1

Instead, he did exactly what you'd want a person to do in this scenario: he emailed IT support and asked if this email was valid. But IT made a fatal mistake in their response.

podesta-it-support-response

Do you see it? Here's the kicker:

Mr. Delavan, in an interview, said that his bad advice was a result of a typo: He knew this was a phishing attack, as the campaign was getting dozens of them. He said he had meant to type that it was an “illegitimate” email, an error that he said has plagued him ever since.

One word. He got one word wrong. But what a word to get wrong, and in the first sentence! The email did provide the proper Google address to reset your password. But the lede was already buried since the first sentence said "legitimate"; the phishing link in that email was then clicked. And the rest is literally history.

What's even funnier (well, in the way of gallows humor, I guess) is that public stats were left enabled for that bit.ly tracking link, so you can see exactly what crazy domain that "Google login page" resolved to, and that it was clicked exactly twice, on the same day it was mailed.

bitly-podesta-tracking-link

As I said, these were not exactly sophisticated attackers. So yeah, in theory an attentive user could pay attention to the browser's address bar and notice that after clicking the link, they arrived at

http://myaccount.google.com-securitysettingpage.tk/security/signinoptions/password

instead of

https://myaccount.google.com/security

Note that the phishing URL is carefully constructed so the most "correct" part is at the front, and weirdness is sandwiched in the middle. Unless you're paying very close attention and your address bar is long enough to expose the full URL, it's … tricky. See this 10 second video for a dramatic example.

(And if you think that one's good, check out this one. Don't forget all the unicode look-alike trickery you can pull, too.)

I originally wrote this post as a presentation for the Berkeley Computer Science Club back in March, and at that time I gathered a list of public phishing pages I found on the web.

nightlifesofl.com
ehizaza-limited.com
tcgoogle.com
appsgoogie.com
security-facabook.com

Of those five examples from 6 months ago, one is completely gone, one loads just fine, and three present an appropriately scary red interstitial warning page that strongly advises you not to visit the page you're trying to visit, courtesy of Google's safe browsing API. But of course this kind of shared blacklist domain name protection will be completely useless on any fresh phishing site. (Don't even get me started on how blacklists have never really worked anyway.)

google-login-phishing-page

It doesn't exactly require a PhD degree in computer science to phish someone:

  • Buy a crazy long, realistic looking domain name.
  • Point it to a cloud server somewhere.
  • Get a free HTTPS certificate courtesy of our friends at Let's Encrypt.
  • Build a realistic copy of a login page that silently transmits everything you type in those login fields to you – perhaps even in real time, as the target types.
  • Harvest email addresses and mass mail a plausible looking phishing email with your URL.

I want to emphasize that although clearly mistakes were made in this specific situation, none of the people involved here were amateurs. They had training and experience. They were working with IT and security professionals. Furthermore, they knew digital attacks were incoming.

The … campaign was no easy target; several former employees said the organization put particular stress on digital safety.

Work emails were protected by two-factor authentication, a technique that uses a second passcode to keep accounts secure. Most messages were deleted after 30 days and staff went through phishing drills. Security awareness even followed the campaigners into the bathroom, where someone put a picture of a toothbrush under the words: “You shouldn’t share your passwords either.”

The campaign itself used two factor auth extensively, which is why personal gmail accounts were targeted, because they were less protected.

The key takeaway here is that it's basically impossible, statistically speaking, to prevent your organization from being phished.

Or is it?

techsolidarity-logo

Nobody is doing better work in this space right now than Maciej Ceglowski and Tech Solidarity. Their list of basic security precautions for non-profits and journalists is pure gold and has been vetted by many industry professionals with security credentials that are actually impressive, unlike mine. Everyone should read this list very closely, point by point.

Everyone?

Computers, courtesy of smartphones, are now such a pervasive part of average life for average people that there is no longer any such thing as "computer security". There is only security. In other words, these are normal security practices everyone should be familiar with. Not just computer geeks. Not just political activists and politicians. Not just journalists and nonprofits.

Everyone.

It is a fair bit of reading, so because I know you are just as lazy as I am, and I am epically lazy, let me summarize what I view as the three important takeaways from the hard work Tech Solidarity put into these resources. These three short sentences are the 60 second summary of what you want to do, and what you want to share with others so they do, too.

1) Enable Two Factor authentication through an app, and not SMS, everywhere you can.

google-2fa-1

Logging in with only a password, now matter how long and unique you attempt to make that password, will never be enough. A password is what you know; you need to add the second factor of something you have (or something you are) to achieve significant additional security. SMS can famously be intercepted, social engineered, or sim-jacked all too easily. If it's SMS, it's not secure, period. So install an authenticator app, and use it, at least for your most important credentials such as your email account and your bank.

Have I mentioned that Discourse added two factor authentication support in version 2.0, and our just released 2.1 adds printed backup codes, too? There are two paths forward: you can talk about the solution, or you can build the solution. I'm trying to do both to the best of my ability. Look for the 2FA auth option in your user preferences on your favorite Discourse instance. It's there for you.

(This is also a company policy at Discourse; if you work here, you 2FA everything all the time. No other login option exists.)

2) Make all your passwords 11 characters or more.

It's a long story, but anything under 11 characters is basically the same as having no password at all these days. I personally recommend at least 14 characters, maybe even 16. But this won't be a problem for you, because...

3) Use a password manager.

If you use a password manager, you can simultaneously avoid the pernicious danger of password re-use and the difficulty of coming up with unique and random passwords all the time. It is my hope in the long run that cloud based password management gets deeply built into Android, iOS, OSX, and Windows so that people don't need to run a weird melange of third party apps to achieve this essential task. Password management is foundational and should not be the province of third parties on principle, because you never outsource a core competency.

Bonus rule! For the particularly at-risk, get and use a U2F key.

In the long term, two factor through an app isn't quite secure enough due to the very real (and growing) specter of real-time phishing. Authentication apps offer timed keys that expire after a minute or two, but if the attacker can get you to type an authentication key and relay it to the target site fast enough, they can still log in as you. If you need ultimate protection, look into U2F keys.

u2f-keys

I believe U2F support is still too immature at the moment, particularly on mobile, for this to be practical for the average person right now. But if you do happen to fall into those groups that will be under attack, you absolutely want to set up U2F keys where you can today. They're cheap, and the good news is that they literally make phishing impossible at last. Given that Google had 100% company wide success against phishing with U2F, we know this works.

In today's world, computers are now so omnipresent that there is no longer any such thing as cybersecurity, online security, or computer security – there's only security. You either have it, or you don't. If you follow and share these three rules, hopefully you too can have a modicum of security today.

Discussion

To Serve Man, with Software

I didn't choose to be a programmer. Somehow, it seemed, the computers chose me. For a long time, that was fine, that was enough; that was all I needed. But along the way I never felt that being a programmer was this unambiguously great-for-everyone career field with zero downsides. There are absolutely occupational hazards of being a programmer, and one of my favorite programming quotes is an allusion to one of them:

It should be noted that no ethically-trained software engineer would ever consent to write a DestroyBaghdad procedure. Basic professional ethics would instead require him to write a DestroyCity procedure, to which Baghdad could be given as a parameter.

Which reminds me of another joke that people were telling in 2015:

Donald Trump is basically a comment section running for president

Which is troubling because technically, technically, I run a company that builds comment sections.

Here at the tail end of 2017, from where I sit neither of these jokes seem particularly funny to me any more. Perhaps I have lost the capacity to feel joy as a human being? Haha just kidding! ... kinda.

Remember in 2011 when Marc Andreeseen said that "Software is eating the world?"

software is eating the world, Marc Andreessen

That used to sound all hip and cool and inspirational, like "Wow! We software developers really are making a difference in the world!" and now for the life of me I can't read it as anything other than an ominous warning that we just weren't smart enough to translate properly at the time. But maybe now we are.

to-serve-man

I've said many, many times that the key to becoming an experienced software developer is to understand that you are, at all times, your own worst enemy. I don't mean this in a negative way – you have to constantly plan for and design around your inevitable human mistakes and fallibility. It's fundamental to good software engineering because, well, we're all human. The good-slash-bad news is that you're only accidentally out to get yourself. But what happens when we're infinitely connected and software is suddenly everywhere, in everyone's pockets every moment of the day, starting to approximate a natural extension of our bodies? All of a sudden those little collective social software accidents become considerably more dangerous:

The issue is bigger than any single scandal, I told him. As headlines have exposed the troubling inner workings of company after company, startup culture no longer feels like fodder for gentle parodies about ping pong and hoodies. It feels ugly and rotten. Facebook, the greatest startup success story of this era, isn’t a merry band of hackers building cutesy tools that allow you to digitally Poke your friends. It’s a powerful and potentially sinister collector of personal data, a propaganda partner to government censors, and an enabler of discriminatory advertising.

I'm reminded of a particular Mitchell and Webb skit: "Are we the baddies?"

On the topic of unanticipated downsides to technology, there is no show more essential than Black Mirror. If you haven't watched Black Mirror yet, do not pass go, do not collect $200, go immediately to Netflix and watch it. Go on! Go ahead!

⚠ Fair warning: please DO NOT start with season 1 episode 1 of Black Mirror! Start with season 3, and go forward. If you like those, dip into season 2 and the just-released season 4, then the rest. But humor me and please at least watch the first episode of season 3.

The technology described in Black Mirror can be fanciful at times, but several episodes portray disturbingly plausible scenarios with today's science and tech, much less what we'll have 20 to 50 years from now. These are very real cautionary tales, and some of this stuff is well on its way toward being realized.

Programmers don't think of themselves as people with the power to change the world. Most programmers I know, including myself, grew up as nerds, geeks, social outcasts. Did I ever tell you about the time I wrote a self-destructing Apple // boot disk program to let a girl in middle school know that I liked her? I was (and still am) a terrible programmer, but oh man did I ever test the heck out of that code before copying on to her school floppy disc. But I digress. What do you do when you wake up one day and software has kind of eaten the world, and it is no longer clear if software is in fact an unambiguously good thing, like we thought, like everyone told us … like we wanted it to be?

Months ago I submitted a brief interview for a children's book about coding.

I recently recieved a complimentary copy of the book in the mail. I paged to my short interview, alongside the very cool Kiki Prottsman. I had no real recollection of the interview questions after the months of lead time it takes to print a physical book, but reading the printed page, I suddenly hit myself over the head with the very answer I had been searching my soul for these past 6 months:

Jeff Atwood quote: what do you love most about coding?

In attempting to simplify my answers for an audience of kids, I had concisely articulated the one thing that keeps me coming back to software: to serve man. Not on a platter, for bullshit monetization – but software that helps people be the best version of themselves.

And you know why I do it? I need that help, too. I get tired, angry, upset, emotional, cranky, irritable, frustrated and I need to be reminded from time to time to choose to be the better version of myself. I don't always succeed. But I want to. And I believe everyone else – for some reasonable statistical value of everyone else – fundamentally does, too.

That was the not-so-secret design philosophy behind Stack Overflow, that by helping others become better programmers, you too would become a better programmer. It's unavoidable. And, even better, if we leave enough helpful breadcrumbs behind for those that follow us, we collectively advance the whole of programming for everyone.

I apologize for not blogging much in 2017. I've certainly been busy with Discourse which is actually going great; we grew to 21 people and gave $55,000 back this year to the open source ecosystem we build on. But that's no excuse. The truth is that it's been hard to write because this has been a deeply troubling year in so many dimensions — for men, for tech, for American democracy. I'm ashamed of much that happened, and I think one of the first and most important steps we can take is to embrace explicit codes of conduct throughout our industry. I also continue to believe, if we start to think more holistically about what our software can do to serve all people, not just ourselves personally (or, even worse, the company we work for) — that software can and should be part of the solution.

I tried to amplify on these thoughts in recent podcasts:

 Community Engineering Report with Kim Crayton
 Developer on Fire with Dave Rael
 Dorm Room Tycoon with William Channer

Software is easy to change, but people ... aren't. So in the new year, as software developers, let's make a resolution to focus on the part we can change, and keep asking ourselves one very important question: how can our software help people become the best version of themselves?

Discussion

The Existential Terror of Battle Royale

It's been a while since I wrote a blog post, I guess in general, but also a blog post about video games. Video games are probably the single thing most attributable to my career as a programmer, and everything else I've done professionally after that. I still feel video games are one of the best ways to learn and teach programming, if properly scoped, and furthermore I take many cues from video games in building software.

I would characterize my state of mind for the last six to eight months as … poor. Not only because of current events in the United States, though the neverending barrage of bad news weighs heavily on me, and I continue to be profoundly disturbed by the erosion of core values that I thought most of us stood for as Americans. Didn't we used to look out for each other, care about each other, and fight to protect those that can't protect themselves?

In times like these, I sometimes turn to video games for escapist entertainment. One game in particular caught my attention because of its meteoric rise in player count over the last year.

pubg-steam-stats-nov-2017

That game is Player Unknown's Battlegrounds. I was increasingly curious why it was so popular, and kept getting more popular every month. Calling it a mere phenomenon seems like underselling it; something truly unprecedented is happening here. I finally broke down and bought a copy for $30 in September.

player-unknown-battleground

After a few hours in, I had major flashbacks to the first time I played Counter-Strike in 1998. I realized that we are witnessing the birth of an entirely new genre of game: the Battle Royale. I absolutely believe that huge numbers of people will still be playing some form of this game 20 years from now, too.

steam-top-games-by-player-count-nov-2017

I've seen the Japanese movie, and it's true that there were a few Battle Royale games before PUBG, but this is clearly the defining moment and game for the genre, the one that sets a precedent for everyone else to follow.

It's hard to explain why Battlegrounds is so compelling, but let's start with the loneliness.

Although you can play in squads (and I recommend it), the purest original form of the game is 100 players, last man standing. You begin with nothing but the clothes on your back, in a cargo aircraft, flying over an unknown island in a random trajectory.

battlegrounds-cargo-plane

It's up to you to decide when to drop, and where to land on this huge island, full of incredibly detailed cities, buildings and houses – but strangely devoid of all life.

playerunknown-battleground-drop

What happened to everyone? Where did they go? The sense of apocalypse is overwhelming. It's you versus the world, but where did the rest of the world go? You'll wander this vast deserted island, scavenging for weapons and armor in near complete silence. You'll hear nothing but the wind blowing and the occasional buzzing of flies. But then, suddenly the jarring pak-pak-pak of gunfire off in the distance, reminding you that other people are here. And they aren't your friends.

battle-royale-vista

the dread of never knowing when another of the 100 players on this enormous island is going to suddenly appear around a corner or over a hill is intense. You'll find yourself wearing headphones, cranking the volume, constantly on edge listening for the implied threat of footfalls. Wait, did I hear someone just now, or was that me? You clench, and wait. I've had so many visceral panic moments playing this game, to the point that I had to stop playing just to calm down.

pubg-combat

PUBG is, in its way, the scariest zombie movie I've ever seen, though it lacks a single zombie. It dispenses with the pretense of a story, so you can realize much sooner that the zombies, as terrible as they may be, are nowhere as dangerous to you as your fellow man.

Meanwile, that huge cargo airplane still roars overhead every so often, impassive, indifferent, occasionally dropping supply crates with high powered items to fight over. Airstrikes randomly target areas circled in red on the map, masking footfalls, and forcing movement while raining arbitrary death and terror.

pubg-map

Although the island is huge and you can land anywhere, after a few minutes a random circle is overlaid on the map, and a slowly moving wall of deadly energy starts closing in on that circle. Stay outside that circle at your peril; if you find yourself far on the opposite side of the map from a circle, you better start hunting for a vehicle or boat (they're present, but rare) quickly. These terrordome areas are always shrinking, always impending, in an ever narrowing cone, forcing the remaining survivors closer and closer together. The circles get tighter and deadlier and quicker as the game progresses, ratcheting up the tension and conflict.

Eventually the circle becomes so small that it's impossible for the handful of remaining survivors to avoid contact, and one person, one out of the hundred that originally dropped out of the cargo plane, emerges as the winner. I've never won solo, but I have won squad, and even finishing first out of 25 squads is an unreal, euphoric experience. The odds are so incredibly against you from the outset, plus you quickly discover that 85% of the game is straight up chance: someone happens to roll up behind you, a sniper gets the drop on you, or you get caught in the open with few options. Wrong place, wrong time, game over. Sucks to be you.

pubg-vehicle-shooting

You definitely learn to be careful, but there's only so careful you can be. Death comes quickly, without warning, and often at random. What else can you expect from a game mode where there are 100 players but only 1 eventual winner?

There haven't been many Battle Royale games, so this game mode is a relatively new phenomenon. If you'd like to give it a try for free, I highly recommend Fortnite's Battle Royale mode which is 100% free, a near-clone of PUBG, and quite good in its own right. They added their Battle Royale mode well after the fact; the core single player "save the world" gameplay of building stuff and fighting zombie hordes is quite fun too, though a bit shallow. It also has what is, in my opinion, some of the most outstanding visual style I've ever seen in a game – a cool, hyperbolic cartoon mix of Chuck Jones, Sam & Max, and Cloudy with a Chance of Meatballs. It's also delightfully diverse in its character models.

fortnite-battle-royale

(The only things you'll give up over PUBG are the realistic art style, vehicles, and going prone. But the superb structure building system in Fortnite almost makes up for that. If nothing else it is a demonstration of how incredibly compelling the Battle Royale game mode is, because that part of the game is wildly successful in a a way that the core game, uh, wasn't. Also it's free!)

I didn't intend for this to happen, but to me, the Battle Royale game mode perfectly captures the zeitgeist of the current moment, and matches my current state of mind to a disturbing degree. It's an absolutely terrifying experience of every human for themselves, winner takes all, with impossible odds. There are moments it can be thrilling, even inspiring, but mostly it's brutal and unforgiving. To succeed you need to be exceedingly cautious, highly skilled, and just plain lucky. Roll the dice again, but know that everyone will run towards the sound of gunfire in hopes of picking off survivors and looting their corpses. Including you.

Battle Royale is not the game mode we wanted, it's not the game mode we needed, it's the game mode we all deserve. And the best part is, when we're done playing, we can turn it off.

Discussion

Hacker, Hack Thyself

We've read so many sad stories about communities that were fatally compromised or destroyed due to security exploits. We took that lesson to heart when we founded the Discourse project; we endeavor to build open source software that is secure and safe for communities by default, even if there are thousands, or millions, of them out there.

However, we also value portability, the ability to get your data into and out of Discourse at will. This is why Discourse, unlike other forum software, defaults to a Creative Commons license. As a basic user on any Discourse you can easily export and download all your posts right from your user page.

Discourse Download All Posts

As a site owner, you can easily back up and restore your entire site database from the admin panel, right in your web browser. Automated weekly backups are set up for you out of the box, too. I'm not the world's foremost expert on backups for nothing, man!

Discourse database backup download

Over the years, we've learned that balancing security and data portability can be tricky. You bet your sweet ASCII a full database download is what hackers start working toward the minute they gain any kind of foothold in your system. It's the ultimate prize.

To mitigate this threat, we've slowly tightened restrictions around Discourse backups in various ways:

  • Administrators have a minimum password length of 15 characters.

  • Both backup creation and backup download administrator actions are formally logged.

  • Backup download tokens are single use and emailed to the address of the administrator, to confirm that user has full control over the email address.

The name of the security game is defense in depth, so all these hardening steps help … but we still need to assume that Internet Bad Guys will somehow get a copy of your database. And then what? Well, what's in the database?

  • Identity cookies

    Cookies are, of course, how the browser can tell who you are. Cookies are usually stored as hashes, rather than the actual cookie value, so having the hash doesn't let you impersonate the target user. Furthermore, most modern web frameworks rapidly cycle cookies, so they are only valid for a brief 10 to 15 minute window anyway.

  • Email addresses

    Although users have reason to be concerned about their emails being exposed, very few people treat their email address as anything particularly precious these days.

  • All posts and topic content

    Let's assume for the sake of argument that this is a fully public site and nobody was posting anything particularly sensitive there. So we're not worried, at least for now, about trade secrets or other privileged information being revealed, since they were all public posts anyway. If we were, that's a whole other blog post I can write at a later date.

  • Password hashes

    What's left is the password hashes. And that's … a serious problem indeed.

Now that the attacker has your database, they can crack your password hashes with large scale offline attacks, using the full resources of any cloud they can afford. And once they've cracked a particular password hash, they can log in as that user … forever. Or at least until that user changes their password.

⚠️ That's why, if you know (or even suspect!) your database was exposed, the very first thing you should do is reset everyone's password.

Discourse database password hashes

But what if you don't know? Should you preemptively reset everyone's password every 30 days, like the world's worst bigco IT departments? That's downright user hostile, and leads to serious pathologies of its own. The reality is that you probably won't know when your database has been exposed, at least not until it's too late to do anything about it. So it's crucial to slow the attackers down, to give yourself time to deal with it and respond.

Thus, the only real protection you can offer your users is just how resistant to attack your stored password hashes are. There are two factors that go into password hash strength:

  1. The hashing algorithm. As slow as possible, and ideally designed to be especially slow on GPUs for reasons that will become painfully obvious about 5 paragraphs from now.

  2. The work factor or number of iterations. Set this as high as possible, without opening yourself up to a possible denial of service attack.

I've seen guidance that said you should set the overall work factor high enough that hashing a password takes at least 8ms on the target platform. It turns out Sam Saffron, one of my Discourse co-founders, made a good call back in 2013 when he selected the NIST recommendation of PBKDF2-HMAC-SHA256 and 64k iterations. We measured, and that indeed takes roughly 8ms using our existing Ruby login code on our current (fairly high end, Skylake 4.0 Ghz) servers.

But that was 4 years ago. Exactly how secure are our password hashes in the database today? Or 4 years from now, or 10 years from now? We're building open source software for the long haul, and we need to be sure we are making reasonable decisions that protect everyone. So in the spirit of designing for evil, it's time to put on our Darth Helmet and play the bad guy – let's crack our own hashes!

We're gonna use the biggest, baddest single GPU out there at the moment, the GTX 1080 Ti. As a point of reference, for PBKDF2-HMAC-SHA256 the 1080 achieves 1180 kH/s, whereas the 1080 Ti achieves 1640 kH/s. In a single video card generation the attack hash rate has increased nearly 40 percent. Ponder that.

First, a tiny hello world test to see if things are working. I downloaded hashcat. I logged into our demo at try.discourse.org and created a new account with the password 0234567890; I checked the database, and this generated the following values in the hash and salt database columns for that new user:

hash
93LlpbKZKficWfV9jjQNOSp39MT0pDPtYx7/gBLl5jw=
salt
ZWVhZWQ4YjZmODU4Mzc0M2E2ZDRlNjBkNjY3YzE2ODA=

Hashcat requires the following input file format: one line per hash, with the hash type, number of iterations, salt and hash (base64 encoded) separated by colons:

type   iter  salt                                         hash
sha256:64000:ZWVhZWQ4YjZmODU4Mzc0M2E2ZDRlNjBkNjY3YzE2ODA=:93LlpbKZKficWfV9jjQNOSp39MT0pDPtYx7/gBLl5jw=

Let's hashcat it up and see if it works:

./h64 -a 3 -m 10900 .\one-hash.txt 0234567?d?d?d

Note that this is an intentionally tiny amount of work, it's only guessing three digits. And sure enough, we cracked it fast! See the password there on the end? We got it.

sha256:64000:ZWVhZWQ4YjZmODU4Mzc0M2E2ZDRlNjBkNjY3YzE2ODA=:93LlpbKZKficWfV9jjQNOSp39MT0pDPtYx7/gBLl5jw=:0234567890

Now that we know it works, let's get down to business. But we'll start easy. How long does it take to brute force attack the easiest possible Discourse password, 8 numbers – that's "only" 108 combinations, a little over one hundred million.

Hash.Type........: PBKDF2-HMAC-SHA256
Time.Estimated...: Fri Jun 02 00:15:37 2017 (1 hour, 0 mins)
Guess.Mask.......: ?d?d?d?d?d?d?d?d [8]

Even with a top of the line GPU that's … OK, I guess. Remember this is just one hash we're testing against, so you'd need one hour per row (user) in the table. And I have more bad news for you: Discourse hasn't allowed 8 character passwords for quite some time now. How long does it take if we try longer numeric passwords?

?d?d?d?d?d?d?d?d?d [9]
Fri Jun 02 10:34:42 2017 (11 hours, 18 mins)

?d?d?d?d?d?d?d?d?d?d [10]
Tue Jun 06 17:25:19 2017 (4 days, 18 hours)

?d?d?d?d?d?d?d?d?d?d?d [11]
Mon Jul 17 23:26:06 2017 (46 days, 0 hours)

?d?d?d?d?d?d?d?d?d?d?d?d [12]
Tue Jul 31 23:58:30 2018 (1 year, 60 days)

But all digit passwords are easy mode, for babies! How about some real passwords that use at least lowercase letters, or lowercase + uppercase + digits?

Guess.Mask.......: ?l?l?l?l?l?l?l?l [8]
Time.Estimated...: Mon Sep 04 10:06:00 2017 (94 days, 10 hours)

Guess.Mask.......: ?1?1?1?1?1?1?1?1 [8] (-1 = ?l?u?d)
Time.Estimated...: Sun Aug 02 09:29:48 2020 (3 years, 61 days)

A brute force try-every-single-letter-and-number attack is not looking so hot for us at this point, even with a high end GPU. But what if we divided the number by eightby putting eight video cards in a single machine? That's well within the reach of a small business budget or a wealthy individual. Unfortunately, dividing 38 months by 8 isn't such a dramatic reduction in the time to attack. Instead, let's talk about nation state attacks where they have the budget to throw thousands of these GPUs at the problem (1.1 days), maybe even tens of thousands (2.7 hours), then … yes. Even allowing for 10 character password minimums, you are in serious trouble at that point.

If we want Discourse to be nation state attack resistant, clearly we'll need to do better. Hashcat has a handy benchmark mode, and here's a sorted list of the strongest (slowest) hashes that Hashcat knows about benchmarked on a rig with 8 Nvidia GTX 1080 GPUs. Of the things I recognize on that list, bcrypt, scrypt and PBKDF2-HMAC-SHA512 stand out.

My quick hashcat results gave me some confidence that we weren't doing anything terribly wrong with the Discourse password hashes stored in the database. But I wanted to be completely sure, so I hired someone with a background in security and penetration testing to, under a signed NDA, try cracking the password hashes of two live and very popular Discourse sites we currently host.

I was provided two sets of password hashes from two different Discourse communities, containing 5,909 and 6,088 hashes respectively. Both used the PBKDF2-HMAC-SHA256 algorithm with a work factor of 64k. Using hashcat, my Nvidia GTX 1080 Ti GPU generated these hashes at a rate of ~27,000/sec.

Common to all discourse communities are various password requirements:

  • All users must have a minimum password length of 10 characters.
  • All administrators must have a minimum password length of 15 characters.
  • Users cannot use any password matching a blacklist of the 10,000 most commonly used passwords.
  • Users can choose to create a username and password or use various third party authentication mechanisms (Google, Facebook, Twitter, etc). If this option is selected, a secure random 32 character password is autogenerated. It is not possible to know whether any given password is human entered, or autogenerated.

Using common password lists and masks, I cracked 39 of the 11,997 hashes in about three weeks, 25 from the ████████ community and 14 from the ████████ community.

This is a security researcher who commonly runs these kinds of audits, so all of the attacks used wordlists, along with known effective patterns and masks derived from the researcher's previous password cracking experience, instead of raw brute force. That recovered the following passwords (and one duplicate):

007007bond
123password
1qaz2wsx3e
A3eilm2s2y
Alexander12
alexander18
belladonna2
Charlie123
Chocolate1
christopher8
Elizabeth1
Enterprise01
Freedom123
greengrass123
hellothere01
I123456789
Iamawesome
khristopher
l1ghthouse
l3tm3innow
Neversaynever
password1235
pittsburgh1
Playstation2
Playstation3
Qwerty1234
Qwertyuiop1
qwertyuiop1234567890
Spartan117
springfield0
Starcraft2
strawberry1
Summertime
Testing123
testing1234
thecakeisalie02
Thirteen13
Welcome123

If we multiply this effort by 8, and double the amount of time allowed, it's conceivable that a very motivated attacker, or one with a sophisticated set of wordlists and masks, could eventually recover 39 × 16 = 624 passwords, or about five percent of the total users. That's reasonable, but higher than I would like. We absolutely plan to add a hash type table in future versions of Discourse, so we can switch to an even more secure (read: much slower) password hashing scheme in the next year or two.

bcrypt $2*$, Blowfish (Unix)
  20273 H/s

scrypt
  886.5 kH/s

PBKDF2-HMAC-SHA512
  542.6 kH/s 

PBKDF2-HMAC-SHA256
 1646.7 kH/s 

After this exercise, I now have a much deeper understanding of our worst case security scenario, a database compromise combined with a professional offline password hashing attack. I can also more confidently recommend and stand behind our engineering work in making Discourse secure for everyone. So if, like me, you're not entirely sure you are doing things securely, it's time to put those assumptions to the test. Don't wait around for hackers to attack you — hacker, hack thyself!

[advertisement] At Stack Overflow, we put developers first. We already help you find answers to your tough coding questions; now let us help you find your next job.
Discussion