Are You an XML Bozo?
Here’s a helpful article that documents common pitfalls to avoid when composing XML documents. Nobody wants to be called an XML Bozo by Tim Bray, the co-editor of the XML specification, right?
There seem to be developers who think that well-formedness is awfully hard — if not impossible — to get right when producing XML programmatically and developers who can get it right and wonder why the others are so incompetent. I assume no one wants to appear incompetent or to be called names. Therefore, I hope the following list of dos and donts helps developers to move from the first group to the latter.
- Dont think of XML as a text format
- Dont use text-based templates
- Dont
- Use an isolated serializer
- Use a tree or a stack (or an XML parser)
- Dont try to manage namespace declarations manually
- Use unescaped Unicode strings in memory
- Use UTF-8 (or UTF-16) for output
- Use NFC
- Dont expect software to look inside comments
- ont rely on external entities on the Web
- Dont bother with CDATA sections
- Dont bother with escaping non-ASCII
- Avoid adding pretty-printing white space in character data
- Dont use
text/xml
- Use XML 1.0
- Test with astral characters
- Test with forbidden control characters
- Test with broken UTF-*
I’m a little ambivalent about XML, largely due to what John Lam calls “The Angle Bracket Tax.” I think XSLT is utterly insane for anything except the most trivial of tasks, but I do like XPath – it’s sort of like SQL with automatic, joinless parent-child relationships.
But XML is generally the least of all available evils, and if you’re going to use it, you might as well follow the rules.