Here's a helpful article that documents some common pitfalls to avoid when composing XML documents. Nobody wants to be called an XML Bozo by Tim Bray, the co-editor of the XML specification, right?
There seem to be developers who think that well-formedness is awfully hard -- if not impossible -- to get right when producing XML programmatically and developers who can get it right and wonder why the others are so incompetent. I assume no one wants to appear incompetent or to be called names. Therefore, I hope the following list of dos and don'ts helps developers to move from the first group to the latter.
- Don't think of XML as a text format
- Don't use text-based templates
- Don't
- Use an isolated serializer
- Use a tree or a stack (or an XML parser)
- Don't try to manage namespace declarations manually
- Use unescaped Unicode strings in memory
- Use UTF-8 (or UTF-16) for output
- Use NFC
- Don't expect software to look inside comments
- Don't rely on external entities on the Web
- Don't bother with CDATA sections
- Don't bother with escaping non-ASCII
- Avoid adding pretty-printing white space in character data
- Don't use
text/xml- Use XML 1.0
- Test with astral characters
- Test with forbidden control characters
- Test with broken UTF-*
I'm a little ambivalent about XML, largely due to what John Lam calls "The Angle Bracket Tax". I think XSLT is utterly insane for anything except the most trivial of tasks, but I do like XPath-- it's sort of like SQL with automatic, joinless parent-child relationships.
But XML is generally the least of all available evils, and if you're going to use it, you might as well follow the rules.
There seem to be developers who think that well-formedness is awfully hard -- if not impossible -- to get right when producing XML programmatically and developers who can get it right and wonder why the others are so incompetent. I assume no one wants to appear incompetent or to be called names. Therefore, I hope the following list of dos and don'ts helps developers to move from the first group to the latter.