Coding Horror

programming and human factors

Are You an XML Bozo?

Here's a helpful article that documents some common pitfalls to avoid when composing XML documents. Nobody wants to be called an XML Bozo by Tim Bray, the co-editor of the XML specification, right?

Bozo the clownThere seem to be developers who think that well-formedness is awfully hard -- if not impossible -- to get right when producing XML programmatically and developers who can get it right and wonder why the others are so incompetent. I assume no one wants to appear incompetent or to be called names. Therefore, I hope the following list of dos and don'ts helps developers to move from the first group to the latter.

  1. Don't think of XML as a text format
  2. Don't use text-based templates
  3. Don't print
  4. Use an isolated serializer
  5. Use a tree or a stack (or an XML parser)
  6. Don't try to manage namespace declarations manually
  7. Use unescaped Unicode strings in memory
  8. Use UTF-8 (or UTF-16) for output
  9. Use NFC
  10. Don't expect software to look inside comments
  11. Don't rely on external entities on the Web
  12. Don't bother with CDATA sections
  13. Don't bother with escaping non-ASCII
  14. Avoid adding pretty-printing white space in character data
  15. Don't use text/xml
  16. Use XML 1.0
  17. Test with astral characters
  18. Test with forbidden control characters
  19. Test with broken UTF-*

I'm a little ambivalent about XML, largely due to what John Lam calls "The Angle Bracket Tax". I think XSLT is utterly insane for anything except the most trivial of tasks, but I do like XPath-- it's sort of like SQL with automatic, joinless parent-child relationships.

But XML is generally the least of all available evils, and if you're going to use it, you might as well follow the rules.

Written by Jeff Atwood

Indoor enthusiast. Co-founder of Stack Overflow and Discourse. Disclaimer: I have no idea what I'm talking about. Find me here: https://infosec.exchange/@codinghorror