« Learning Python | Main | Finishing a project »

September 08, 2004

XML Prefixes and Namespaces

Arguably namespaces and encodings are the two most complex beasts to handle when processing XML. Namespaces are actually a graft onto the original XML 1.0 specification see XML 1.0 and namespaces. The namespace spec changed the allowable names in an xml document to allow you to define a namespace. Originally an XML name could have any number of colons, for example "x:a:chris" was a legal name. The namespace spec changed that to allow only a single colon. For example "x:chris" would create a name that had a localPart "chris" and the namespace uri that was bound to "x".

The following fragment illustrates some things that you can do with namespaces:

<doc xmlns="myNamespace" xmlns:pre="myNamespace" xmlns:pro="myNamespace">
<pro:A pre:a="foo" b="foo">
<C>
<B xmlns="" pro:a="foo" xmlns:pro="bar"/>
</C>
</pro:A>
</doc>

The following output (from the stream API) indicates the namespace of the elements in brackets. Some things to note, if you undeclare the default namespace the element that contains that redeclaration is not in the default namespace. Also you can use a namespace declaration in an attribute list before you declare it.

START_ELEMENT [<['myNamespace']:doc xmlns="myNamespace" xmlns:pre="myNamespace" xmlns:pro="myNamespace">]
START_ELEMENT [<['myNamespace']:pro:A ['myNamespace']:pre:a="foo" b="foo">]
START_ELEMENT [<['myNamespace']:C>]
START_ELEMENT [<B xmlns="" xmlns:pro="bar" ['bar']:pro:a="foo">]
END_ELEMENT [</B>]
END_ELEMENT [</['myNamespace']:C>]
END_ELEMENT [</['myNamespace']:pro:A>]
END_ELEMENT [</['myNamespace']:doc>]

The first thing to note is that namespaces are scoped. That is, you can rebind them in the scope of the startelement that declared them. Anything that references the prefix will get bound to the new uri. This is to allow documents to be nested inside other documents and not have their namespaces change. The default namespace is attached to anything that is not specifically bound with a prefix. You can undeclare the default namespace by binding it to "". You can also have multiple prefixes bound to the same uri. This creates situation where unless you know the prefix used to bind the uri to the local name you can never recreate the document.


Attributes and Namespaces

Many people think that attributes are automatically put into the namespace of the element that declares the attribute. This isn't true. Attributes are in the default namespace (if it is active) or in the namespace that is bound to the prefix that is declared.


QName

Prefixes are part of the XML Infoset http://www.w3.org/TR/xml-infoset/#infoitem.element

They are defined here http://www.w3.org/TR/REC-xml-names/#ns-qualnames

They are explicitly referenced in the XPath API http://www.w3.org/TR/xpath#predicates


Why do prefixes matter?

1) Philosophical: prefixes are explicitly part of the the XML Infoset. Therefore parsers/processors should be required to always report them.

2) A large majority of XML applications actually use prefixes. It is extremely common for developers to utilize prefixes in their applications, even if they strictly don't need to or shouldn't. It's treated like an alias for the namespace URI. (Of course, it's not a good idea for developers to treat it as an alias since the prefix is only properly valid in it's declared scope, but still some do). Therefore, since prefixes are widely used, it doesn't make sense to burden the majority of developers with a prefix-free QName for the minority of developers who in practice don't use prefixes.


Prefixes and Canonicalization

http://www.w3.org/TR/xml-c14n#NoNSPrefixRewriting

4.4 No Namespace Prefix Rewriting The C14N-20000119 Canonical XML draft described a method for rewriting namespace prefixes such that two documents having logically equivalent namespace declarations would also have identical namespace prefixes. The goal was to eliminate dependence on the particular namespace prefixes in a document when testing for logical equivalence. However, there now exist a number of contexts in which namespace prefixes can impart information value in an XML document. For example, an XPath expression in an attribute value or element content can reference a namespace prefix. Thus, rewriting the namespace prefixes would damage such a document by changing its meaning (and it cannot be logically equivalent if its meaning has changed).

More formally, let D1 be a document containing an XPath in an attribute value or element content that refers to namespace prefixes used in D1. Further assume that the namespace prefixes in D1 will all be rewritten by the canonicalization method. Let D2 = D1, then modify the namespace prefixes in D2 and modify the XPath expression's references to namespace prefixes such that D2 and D1 remain logically equivalent. Since namespace rewriting does not include occurrences of namespace references in attribute values and element content, the canonical form of D1 does not equal the canonical form of D2 because the XPath will be different. Thus, although namespace rewriting normalizes the namespace declarations, the goal eliminating dependence on the particular namespace prefixes in the document is not achieved.

Moreover, it is possible to prove that namespace rewriting is harmful, rather than simply ineffective. Let D1 be a document containing an XPath in an attribute value or element content that refers to namespace prefixes used in D1. Further assume that the namespace prefixes in D1 will all be rewritten by the canonicalization method. Now let D2 be the canonical form of D1. Clearly, the canonical forms of D1 and D2 are equivalent (since D2 is the canonical form of the canonical form of D1), yet D1 and D2 are not logically equivalent because the aforementioned XPath works in D1 and doesn't work in D2.

Note that an argument similar to this can be leveled against the XML canonicalization method based on any of the cases in the Limitations, the problems cannot easily be fixed in those cases, whereas here we have an opportunity to avoid purposefully introducing such a limitation.

Applications that must test for logical equivalence must perform more sophisticated tests than mere octet stream comparison. However, this is quite likely to be necessary in any case in order to test for logical equivalencies based on application rules as well as rules from other XML-related recommendations, working drafts, and future works.

Posted by Chris at September 8, 2004 02:24 PM

Comments