Coding Horror

programming and human factors

Top 25 Most Dangerous Programming Mistakes

I don't usually do news and current events here, but I'm making an exception for the CWE/SANS Top 25 Most Dangerous Programming Errors list. This one is important, and deserves a wide audience, so I'm repeating it here -- along with a brief hand-edited summary of each error.

If you work on software in any capacity, at least skim this list. I encourage you to click through for greater detail on anything you're not familiar with, or that piques your interest.

  1. Improper Input Validation
    Ensure that your input is valid. If you're expecting a number, it shouldn't contain letters. Nor should the price of a new car be allowed to be a dollar. Incorrect input validation can lead to vulnerabilities when attackers can modify their inputs in unexpected ways. Many of today's most common vulnerabilities can be eliminated, or at least reduced, with strict input validation.
  2. Improper Encoding or Escaping of Output
    Insufficient output encoding is at the root of most injection-based attacks. An attacker can modify the commands that you intend to send to other components, possibly leading to a complete compromise of your application - not to mention exposing the other components to exploits that the attacker would not be able to launch directly. When your program generates outputs to other components in the form of structured messages such as queries or requests, be sure to separate control information and metadata from the actual data.
  3. Failure to Preserve SQL Query Structure (aka 'SQL Injection')
    If attackers can influence the SQL that you send to your database, they can modify the queries to steal, corrupt, or otherwise change your underlying data. If you use SQL queries in security controls such as authentication, attackers could alter the logic of those queries to bypass security.
  4. Failure to Preserve Web Page Structure (aka 'Cross-site Scripting')
    Cross-site scripting (XSS) is a result of combining the stateless nature of HTTP, the mixture of data and script in HTML, lots of data passing between web sites, diverse encoding schemes, and feature-rich web browsers. If you're not careful, attackers can inject Javascript or other browser-executable content into a web page that your application generates. Your web page is then accessed by other users, whose browsers execute that malicious script as if it came from you -- because, after all, it did come from you! Suddenly, your web site is serving code that you didn't write. The attacker can use a variety of techniques to get the input directly into your server, or use an unwitting victim as the middle man.
  5. Failure to Preserve OS Command Structure (aka 'OS Command Injection')
    Your software acts as a bridge between an outsider on the network and the internals of your operating system. When you invoke another program on the operating system, and you allow untrusted inputs to be fed into the command string, you are inviting attackers into your operating system.
  6. Cleartext Transmission of Sensitive Information
    Information sent across a network crosses many different nodes in transit to its final destination. If your software sends sensitive, private data or authentication credentials, beware: attackers could sniff them right off the wire. All they need to do is control one node along the path to the final destination, any node within the same networks of those transit nodes, or plug into an available interface. Obfuscating traffic using schemes like Base64 and URL encoding offers no protection.
  7. Cross-Site Request Forgery (CSRF)
    Cross-site request forgery is like accepting a package from a stranger -- except the attacker tricks a user into activating a HTTP request "package" that goes to your site. The user might not even be aware that the request is being sent, but once the request gets to your server, it looks as if it came from the user -- not the attacker. The attacker has masqueraded as a legitimate user and gained all the potential access that the user has. This is especially handy when the user has administrator privileges, resulting in a complete compromise of your application's functionality.
  8. Race Condition
    A race condition involves multiple processes in which the attacker has full control over one process; the attacker exploits the process to create chaos, collisions, or errors. Data corruption and denial of service are the norm. The impact can be local or global, depending on what the race condition affects - such as state variables or security logic - and whether it occurs within multiple threads, processes, or systems.
  9. Error Message Information Leak
    Chatty error messages can disclose secrets to any attacker who misuses your software. The secrets could cover a wide range of valuable data, including personally identifiable information (PII), authentication credentials, and server configuration. They might seem like harmless secrets useful to your users and admins, such as the full installation path of your software -- but even these little secrets can greatly simplify a more concerted attack.
  10. Failure to Constrain Operations within the Bounds of a Memory Buffer
    The scourge of C applications for decades, buffer overflows have been remarkably resistant to elimination. Attack and detection techniques continue to improve, and today's buffer overflow variants aren't always obvious at first or even second glance. You may think that you're completely immune to buffer overflows because you write your code in higher-level languages instead of C. But what is your favorite "safe" language's interpreter written in? What about the native code you call? What languages are the operating system API's written in? How about the software that runs Internet infrastructure?
  11. External Control of Critical State Data
    If you store user state data in a place where an attacker can modify it, this reduces the overhead for a successful compromise. Data could be stored in configuration files, profiles, cookies, hidden form fields, environment variables, registry keys, or other locations, all of which can be modified by an attacker. In stateless protocols such as HTTP, some form of user state information must be captured in each request, so it is exposed to an attacker out of necessity. If you perform any security-critical operations based on this data (such as stating that the user is an administrator), then you can bet that somebody will modify the data in order to trick your application.
  12. External Control of File Name or Path
    When you use an outsider's input while constructing a filename, the resulting path could point outside of the intended directory. An attacker could combine multiple ".." or similar sequences to cause the operating system to navigate out of the restricted directory. Other file-related attacks are simplified by external control of a filename, such as symbolic link following, which causes your application to read or modify files that the attacker can't access directly. The same applies if your program is running with raised privileges and it accepts filenames as input. Similar rules apply to URLs and allowing an outsider to specify arbitrary URLs.
  13. Untrusted Search Path
    Your software depends on you, or its environment, to provide a search path (or working path) to find critical resources like code libraries or configuration files. If the search path is under attacker control, then the attacker can modify it to point to resources of the attacker's choosing.
  14. Failure to Control Generation of Code (aka 'Code Injection')
    While it's tough to deny the sexiness of dynamically-generated code, attackers find it equally appealing. It becomes a serious vulnerability when your code is directly callable by unauthorized parties, if external inputs can affect which code gets executed, or if those inputs are fed directly into the code itself.
  15. Download of Code Without Integrity Check
    If you download code and execute it, you're trusting that the source of that code isn't malicious. But attackers can modify that code before it reaches you. They can hack the download site, impersonate it with DNS spoofing or cache poisoning, convince the system to redirect to a different site, or even modify the code in transit as it crosses the network. This scenario even applies to cases in which your own product downloads and installs updates.
  16. Improper Resource Shutdown or Release
    When your system resources have reached their end-of-life, you dispose of them: memory, files, cookies, data structures, sessions, communication pipes, and so on. Attackers can exploit improper shutdown to maintain control over those resources well after you thought you got rid of them. Attackers may sift through the disposted items, looking for sensitive data. They could also potentially reuse those resources.
  17. Improper Initialization
    If you don't properly initialize your data and variables, an attacker might be able to do the initialization for you, or extract sensitive information that remains from previous sessions. If those variables are used in security-critical operations, such as making an authentication decision, they could be modified to bypass your security. This is most prevalent in obscure errors or conditions that cause your code to inadvertently skip initialization.
  18. Incorrect Calculation
    When attackers have control over inputs to numeric calculations, math errors can have security consequences. It might cause you to allocate far more resources than you intended - or far fewer. It could violate business logic (a calculation that produces a negative price), or cause denial of service (a divide-by-zero that triggers a program crash).
  19. Improper Access Control (Authorization)
    If you don't ensure that your software's users are only doing what they're allowed to, then attackers will try to exploit your improper authorization and exercise that unauthorized functionality.
  20. Use of a Broken or Risky Cryptographic Algorithm
    Grow-your-own cryptography is a welcome sight to attackers. Cryptography is hard. If brilliant mathematicians and computer scientists worldwide can't get it right -- and they're regularly obsoleting their own techniques -- then neither can you.
  21. Hard-Coded Password
    Hard-coding a secret account and password into your software is extremely convenient -- for skilled reverse engineers. If the password is the same across all your software, then every customer becomes vulnerable when that password inevitably becomes known. And because it's hard-coded, it's a huge pain to fix.
  22. Insecure Permission Assignment for Critical Resource
    Beware critical programs, data stores, or configuration files with default world-readable permissions. While this issue might not be considered during implementation or design, it should be. Don't require your customers to secure your software for you! Try to be secure by default, out of the box.
  23. Use of Insufficiently Random Values
    You may depend on randomness without even knowing it, such as when generating session IDs or temporary filenames. Pseudo-Random Number Generators (PRNG) are commonly used, but a variety of things can go wrong. Once an attacker can determine which algorithm is being used, he can guess the next random number often enough to launch a successful attack after a relatively small number of tries.
  24. Execution with Unnecessary Privileges
    Your software may need special privileges to perform certain operations; wielding those privileges longer than necessary is risky. When running with extra privileges, your application has access to resources that the application's user can't directly reach. Whenever you launch a separate program with elevated privileges, attackers can potentially exploit those privileges.
  25. Client-Side Enforcement of Server-Side Security
    Don't trust the client to perform security checks on behalf of your server. Attackers can reverse engineer your client and write their own custom clients. The consequences will vary depending on what your security checks are protecting, but some of the more common targets are authentication, authorization, and input validation.

Of course there's nothing truly new here; I essentially went over the same basic list in Sins of Software Security almost two years ago. The only difference is the relative priorities, as web applications start to dominate mainstream computing.

This list of software security mistakes serves the same purpose as McConnell's list of classic development mistakes: to raise awareness. A surprisingly large part of success is recognizing the most common mistakes and failure modes. So you can -- at least in theory -- realize when your project is slipping into one of them. Ignorance is the biggest software project killer of them all.

Heck, even if you are aware of these security mistakes, you might end up committing them anyway. I know I have.

Have you?

Discussion

If You Don't Change the UI, Nobody Notices

I saw a screenshot a few days ago that made me think Windows 7 Beta might actually be worth checking out.

Windows 7-calculator-programmer-mode.png

That's right, Microsoft finally improved the calculator app! We've been complaining for years that Microsoft ships new operating systems with the same boring old default applets the previous version had, which makes the entire operating system look bad:

I know it sounds trivial. But isn't the fit and finish of little applets like these -- Notepad, Calculator, Character Map, Paint, Disk Cleanup, Compressed Folders, and dozens of others -- indicative of the care and design that goes into the entire operating system? If Microsoft can't be bothered to bundle a version of Notepad that has basic amenities like a toolbar, what hope does the rest of the operating system have?

If you visually compare Calculator and Notepad in 2001-era Windows XP with their 2007 Windows Vista equivalents, you might conclude they're identical. But, as Raymond Chen notes, this isn't so:

I find it ironic when people complain that Calc and Notepad haven't changed. In fact, both programs have changed. (Notepad gained some additional menu and status bar options. Calc got a severe workover.) I wouldn't be surprised if these are the same people who complain, "Why does Microsoft spend all its effort on making Windows 'look cool'? They should spend all their efforts on making technical improvements and just stop making visual improvements."

And with Calc, that's exactly what happened: Massive technical improvements. No visual improvement. And nobody noticed. In fact, the complaints just keep coming. "Look at Calc, same as it always was."

The innards of Calc - the arithmetic engine - was completely thrown away and rewritten from scratch. The standard IEEE floating point library was replaced with an arbitrary-precision arithmetic library. This was done after people kept writing ha-ha articles about how Calc couldn't do decimal arithmetic correctly, that for example computing 10.21 - 10.2 resulted in 0.0100000000000016. Today, Calc's internal computations are done with infinite precision for basic operations (addition, subtraction, multiplication, division) and 32 digits of precision for advanced operations (square root, transcendental operators).

It's arguably the perfect Raymond Chen post -- technically dead on, while simultaneously proving that being technically dead on is utterly irrelevant. That's Raymond Chen for you: he's a riddle wrapped in a mystery inside an enigma, slathered in delicious secret sauce.

This is why the screenshot of the Windows 7 Calculator, although seemingly trivial, is so exciting to me. It's evidence that Microsoft is going to pay attention to the visible parts of the operating system this time around. I'm a fan of Vista, despite all the nerd rage on the topic, but I'll be the first to admit that Vista had all the polish of a particularly dull rock. Let's just say the overall user experience was.. uninspiring. This led many people to shrug, sigh "why bother?", and stick with crusty old XP.

This was unfortunate, because if you dug into Vista, you'd find quite a few substantive technical improvements over the now-ancient Windows XP. But many of those improvements were under the hood, and thus invisible to the typical user.

If the user can't find it, the function's not there

Remember, if the user can't find it, the function's not there. Don't bother improving your product unless it results in visible changes the user can see, find, and hopefully appreciate.

Discussion

Overnight Success: It Takes Years

Paul Buchheit, the original lead developer of GMail, notes that the success of GMail was a long time in coming:

We starting working on Gmail in August 2001. For a long time, almost everyone disliked it. Some people used it anyway because of the search, but they had endless complaints. Quite a few people thought that we should kill the project, or perhaps "reboot" it as an enterprise product with native client software, not this crazy Javascript stuff. Even when we got to the point of launching it on April 1, 2004 -- two and a half years after starting work on it -- many people inside of Google were predicting doom. The product was too weird, and nobody wants to change email services. I was told that we would never get a million users.

Once we launched, the response was surprisingly positive, except from the people who hated it for a variety of reasons. Nevertheless, it was frequently described as "niche", and "not used by real people outside of silicon valley".

Now, almost 7 1/2 years after we started working on Gmail, I see [an article describing how Gmail grew 40% last year, compared to 2% for Yahoo and -7% for Hotmail].

Paul has since left Google and now works at his own startup, FriendFeed. Many industry insiders have not been kind to FriendFeed. Stowe Boyd even went so far as to call FriendFeed a failure. Paul takes this criticism in stride:

Creating an important new product generally takes time. FriendFeed needs to continue changing and improving, just as Gmail did six years ago. FriendFeed shows a lot of promise, but it's still a "work in progress".

My expectation is that big success takes years, and there aren't many counter-examples (other than YouTube, and they didn't actually get to the point of making piles of money just yet). Facebook grew very fast, but it's almost 5 years old at this point. Larry and Sergey started working on Google in 1996 -- when I started there in 1999, few people had heard of it yet.

This notion of overnight success is very misleading, and rather harmful. If you're starting something new, expect a long journey. That's no excuse to move slow though. To the contrary, you must move very fast, otherwise you will never arrive, because it's a long journey! This is also why it's important to be frugal -- you don't want to starve to death halfway up the mountain.

Stowe Boyd illustrated his point about FriendFeed with a graph comparing Twitter and FriendFeed traffic. Allow me to update Mr. Boyd's graph with another data point of my own.

twitter vs. friendfeed vs. stackoverflow web traffic

I find Paul's attitude refreshing, because I take the same attitude toward our startup, Stack Overflow. I have zero expectation or even desire for overnight success. What I am planning is several years of grinding through constant, steady improvement.

This business plan isn't much different from my career development plan: success takes years. And when I say years, I really mean it! Not as some cliched regurgitation of "work smarter, not harder." I'm talking actual calendar years. You know, of the 12 months, 365 days variety. You will literally have to spend multiple years of your life grinding away at this stuff, waking up every day and doing it over and over, practicing and gathering feedback each day to continually get better. It might be unpleasant at times and even downright un-fun occasionally, but it's necessary.

This is hardly unique or interesting advice. Peter Norvig's classic Teach Yourself Programming in Ten Years already covered this topic far better than I.

Researchers have shown it takes about ten years to develop expertise in any of a wide variety of areas, including chess playing, music composition, telegraph operation, painting, piano playing, swimming, tennis, and research in neuropsychology and topology. The key is deliberative practice: not just doing it again and again, but challenging yourself with a task that is just beyond your current ability, trying it, analyzing your performance while and after doing it, and correcting any mistakes. Then repeat. And repeat again.

There appear to be no real shortcuts: even Mozart, who was a musical prodigy at age 4, took 13 more years before he began to produce world-class music. The Beatles seemed to burst onto the scene with a string of #1 hits and an appearance on the Ed Sullivan show in 1964. But they had been playing small clubs in Liverpool and Hamburg since 1957, and while they had mass appeal early on, their first great critical success, Sgt. Peppers, was released in 1967.

Honestly, I look forward to waking up someday two or three years from now and doing the exact same thing I did today: working on the Stack Overflow code, eking out yet another tiny improvement or useful feature. Obviously we want to succeed. But on some level, success is irrelevant, because the process is inherently satisfying. Waking up every day and doing something you love -- even better, surrounded by a community who loves it too -- is its own reward. Despite being a metric ton of work.

The blog is no different. I often give aspiring bloggers this key piece of advice: if you're starting a blog, don't expect anyone to read it for six months. If you do, I can guarantee you will be sorely disappointed. However, if you can stick to a posting schedule and produce one or two quality posts every week for an entire calendar year... then, and only then, can you expect to see a trickle of readership. I started this blog in 2004, and it took a solid three years of writing 3 to 5 times per week before it achieved anything resembling popularity within the software development community.

I fully expect to be writing on this blog, in one form or another, for the rest of my life. It is a part of who I am. And with that bit of drama out of the way, I have no illusions: ultimately, I'm just the guy on the internet who writes that blog.

blog comic 255548 full

That's perfectly fine by me. I never said I was clever.

Whether you ultimately achieve readers, or pageviews, or whatever high score table it is we're measuring this week, try to remember it's worth doing because, well -- it's worth doing.

And if you keep doing it long enough, who knows? You might very well wake up one day and find out you're an overnight success.

Discussion

Dictionary Attacks 101

Several high profile Twitter accounts were recently hijacked:

An 18-year-old hacker with a history of celebrity pranks has admitted to Monday's hijacking of multiple high-profile Twitter accounts, including President-Elect Barack Obama's, and the official feed for Fox News.

The hacker, who goes by the handle GMZ, told Threat Level on Tuesday he gained entry to Twitter's administrative control panel by pointing an automated password-guesser at a popular user's account. The user turned out to be a member of Twitter's support staff, who'd chosen the weak password "happiness."

Cracking the site was easy, because Twitter allowed an unlimited number of rapid-fire log-in attempts.

"I feel it's another case of administrators not putting forth effort toward one of the most obvious and overused security flaws," he wrote in an IM interview. "I'm sure they find it difficult to admit it."

If you're a moderator or administrator it is especially negligent to have such an easily guessed password. But the real issue here is the way Twitter allowed unlimited, as-fast-as-possible login attempts.

Given the average user's password choices -- as documented by Bruce Schneier's analysis of 34,000 actual MySpace passwords captured from a phishing attack in late 2006 -- this is a pretty scary scenario.

myspace-phishing-password-statistics-character-sets

myspace-phishing-password-statistics-length

Based on this data, the average MySpace user has an 8 character alphanumeric password. Which isn't great, but doesn't sound too bad. That is, until you find out that 28 percent of those alphanumerics were all lowercase with a single final digit -- and two-thirds of the time that final digit was 1!

Yes, brute force attacks are still for dummies. Even the typically terrible MySpace password -- eight character all lowercase, ending in 1, would require around 8 billion login attempts:

26 x 26 x 26 x 26 x 26 x 26 x 26 x 1  = 8,031,810,176

At one attempt per second, that would take more than 250 years. Per user!

But a dictionary attack, like the one used in the Twitter hack? Well, that's another story. The entire Oxford English Dictionary contains around 171,000 words. As you might imagine, the average person only uses a tiny fraction of those words, by some estimates somewhere between 10 and 40 thousand. At one attempt per second, we could try every word in the Oxford English Dictionary in slightly less than two days.

Clearly, the last thing you want to do is give attackers carte blanche to run unlimited login attempts. All it takes is one user with a weak password to provide attackers a toehold in your system. In Twitter's case, the attackers really hit the jackpot: the user with the weakest password happened to be a member of the Twitter administrative staff.

Limiting the number of login attempts per user is security 101. If you don't do this, you're practically setting out a welcome mat for anyone to launch a dictionary attack on your site, an attack that gets statistically more effective every day the more users you attract. In some systems, your account can get locked out if you try and fail to log in a certain number of times in a row. This can lead to denial of service attacks, however, and is generally discouraged. It's more typical for each failed login attempt to take longer and longer, like so:

1st failed loginno delay
2nd failed login2 sec delay
3rd failed login4 sec delay
4th failed login8 sec delay
5th failed login16 sec delay

And so on. Alternately, you could display a CAPTCHA after the fourth attempt.

There are endless variations of this technique, but the net effect is the same: attackers can only try a handful of passwords each day. A brute force attack is out of the question, and a broad dictionary attack becomes impractical, at least in any kind of human time.

It's tempting to blame Twitter here, but honestly, I'm not sure they're alone. I forget my passwords a lot. I've made at least five or six attempts to guess my password on multiple websites and I can't recall ever experiencing any sort of calculated delay or account lockouts. I'm reasonably sure the big commercial sites have this mostly figured out. But since every rinky-dink website on the planet demands that I create unique credentials especially for them, any of them could be vulnerable. You better hope they're all smart enough to throttle failed logins -- and that you're careful to use unique credentials on every single website you visit.

Maybe this was less of a problem in the bad old days of modems, as there were severe physical limits on how fast data could be transmitted to a website, and how quickly that website could respond. But today, we have the one-two punch of naive websites running on blazing fast hardware, and users with speedy broadband connections. Under these conditions, I could see attackers regularly achieving up to two password attempts per second.

If you thought of dictionary attacks as mostly a desktop phenomenon, perhaps it's time to revisit that assumption. As Twitter illustrates, the web now offers ripe conditions for dictionary attacks. I urge you to test your website, or any websites you use -- and make sure they all have some form of failed login throttling in place.

Discussion

Are You Creating Micromanagement Zombies?

Do you manage other programmers, in any capacity? Then take Kathy Sierra's quiz:

  1. Do you pride yourself on being "on top of" the projects or your direct reports? Do you have a solid grasp of the details of every project?

  2. Do you believe that you could perform most of the tasks of your direct reports, and potentially do a better job?

  3. Do you pride yourself on frequent communication with your employees? Does that communication include asking them for detailed status reports and updates?

  4. Do you believe that being a manager means that you have more knowledge and skills than your employees, and thus are better equipped to make decisions?

  5. Do you believe that you care about things (quality, deadlines, etc.) more than your employees?

A "yes" to any of these -- even a half-hearted "maybe" -- means you might be creating Micromanagement Zombies.

still from Night of the Living Dead

That's right, Zombies. Mindless automatons who can barely do anything except exactly what they are ordered to do, and even then, only when someone is strictly monitoring what they're doing and how they're doing it. Micromanaging the people you work with is arguably the exact opposite of what a competent team leader or manager should be spending their time doing. So if you're micromanaging at all, even the teeny tiniest little bit, step back and take a long, hard look. It's a sign of deeper problems.

Beyond that, who the heck wants to work with zombies anyway? Shouldn't you endeavor to work with the type of people who are good enough at their jobs that they can make sensible decisions about what they're doing? And they're not constantly trying to eat your brain? Well, figuratively speaking.

want ad: Zombies Seeking Brains

Building teams is like building software. It's easier to describe what not to do than it is to identify the intangibles that make good software development teams jell. But it's pretty clear that micromanagement is one of the biggest risks. In Peopleware, DeMarco and Lister establish seven anti-patterns they dubbed Teamicide:

  1. Defensive Management
  2. Bureaucracy
  3. Physical Separation
  4. Fragmentation of People's Time
  5. Quality Reduction of the Product
  6. Phony Deadlines
  7. Clique Control

Wondering what number one encompasses? You guessed it: micromanagement.

If you're the manager, of course you're going to feel that your judgment is better than that of people under you. You have more experience and perhaps a higher standard of excellence than they have; that's how you got to be the manager. At any point in the project where you don't interpose your own judgment, your people are more likely to make a mistake. So what? Let them make some mistakes. That doesn't mean you can't override a decision (very occasionally) or give specific direction to the project. But if the staff comes to believe it's not allowed to make any errors of its own, the message that you don't trust them comes through loud and clear. There is no message you can send that will better inhibit team formation.

Most managers give themselves excellent grades on knowing when to trust their people and when not to. But in our experience, too many managers err on the side of mistrust. They follow the basic premise that their people may operate completely autonomously, as long as they operate correctly. This amounts to no autonomy at all. The only freedom that has any meaning is the freedom to proceed differently from the way your manager would have proceeded. This is true in a broader sense, too: The right to be right (in your manager's eyes or in your government's eyes) is irrelevant; it's only the right to be wrong that makes you free.

The most obvious defensive management ploys are prescriptive Methodologies ("My people are too dumb to build systems without them") and technical interference by the manager. Both are doomed to fail in the long run. In addition, they make for efficient teamicide. People who feel untrusted have little inclination to bond together into a cooperative team.

In the end, isn't trust what this is about? If you don't trust the people you work with -- and most importantly, actively demonstrate that trust through your actions -- should you really be working with them at all?

Discussion