Here's a helpful article that documents some common pitfalls to avoid when composing XML documents. Nobody wants to be called an XML Bozo by Tim Bray, the co-editor of the XML specification, right?
There seem to be developers who think that well-formedness is awfully hard -- if not impossible -- to get right when producing XML programmatically and developers who can get it right and wonder why the others are so incompetent. I assume no one wants to appear incompetent or to be called names. Therefore, I hope the following list of dos and don'ts helps developers to move from the first group to the latter.
- Don't think of XML as a text format
- Don't use text-based templates
- Don't
- Use an isolated serializer
- Use a tree or a stack (or an XML parser)
- Don't try to manage namespace declarations manually
- Use unescaped Unicode strings in memory
- Use UTF-8 (or UTF-16) for output
- Use NFC
- Don't expect software to look inside comments
- Don't rely on external entities on the Web
- Don't bother with CDATA sections
- Don't bother with escaping non-ASCII
- Avoid adding pretty-printing white space in character data
- Don't use
text/xml
- Use XML 1.0
- Test with astral characters
- Test with forbidden control characters
- Test with broken UTF-*
I'm a little ambivalent about XML, largely due to what John Lam calls "The Angle Bracket Tax". I think XSLT is utterly insane for anything except the most trivial of tasks, but I do like XPath-- it's sort of like SQL with automatic, joinless parent-child relationships.
But XML is generally the least of all available evils, and if you're going to use it, you might as well follow the rules.