Coding Horror

programming and human factors

Of Spaces, Underscores and Dashes

I try to avoid using spaces in filenames and URLs. They're great for human readability, but they're remarkably inconvenient in computer resource locators:

  1. A filename with spaces has to be surrounded by quotes when referenced at the command line:

     XCOPY "c:\test files\reference data.doc" d:
     XCOPY c:\test-files\reference-data.doc d:
    
  2. Any spaces in URLs are converted to the encoded space character by the web browser:

     http://domain.com/test%20files/reference%20data.html
     http://domain.com/test-files/reference-data.html
    

So it behooves us to use something other than a space in file and folder names. Historically, I've used underscore, but I recently discovered that the correct character to substitute for space is the dash. Why?

The short answer is, that's what Google expects:

If you use an underscore '_' character, then Google will combine the two words on either side into one word. So bla.com/kw1_kw2.html wouldn't show up by itself for kw1 or kw2. You'd have to search for kw1_kw2 as a query term to bring up that page.

The slightly longer answer is, the underscore is traditionally considered a word character by the w regex operator.

Here's RegexBuddy matching the w operator against multiple ASCII character sets:

Result of a regex match for w (word characters)

As you can see, the dash is not matched, but underscore is. This_is_a_single_word, but this-is-multiple-words.

Like NutraSweet and Splenda, neither is really an acceptable substitute for a space, but we might as well follow the established convention instead of inventing our own. That's how we ended up with the backslash as a path separator.

Written by Jeff Atwood

Indoor enthusiast. Co-founder of Stack Overflow and Discourse. Disclaimer: I have no idea what I'm talking about. Find me here: https://infosec.exchange/@codinghorror