« August 2004 | Main | October 2004 »

September 21, 2004

StAX based WebService engine

James Strachan is building a StAX based web service engine, check out his post here

Posted by Chris at 10:26 AM | Comments (0)

September 16, 2004

Loose coupling

For all the talk of loose coupling I'm surprised how tightly coupled WebServices have become. I think there is room for a completely interpreted stack that does no binding to the java types but enforces XMLSchema constraints and SOAP QOS for reliability, security and routing. It would be cool to implement this in python or perl and insert it as an adaptive layer between the world and the application.

Posted by Chris at 10:37 AM | Comments (2)

September 08, 2004

Lessons Learned

I've learned a lot of lessons this release. They mostly reinforce my assumptions about good project management.

I was trying to learn how to have mission critical external dependencies, to work within an unclear reporting structure, and to do a double ended rewrite/integration.

What I learned was:
1) try to reduce impact of external dependencies (don't put them on the critical path) and use fixed code v 2.0
2) create a logical reporting structure and put authority with responsibility
3) integration is hard and takes a long time (don't understimate it)
4) document so that you can negotiate at the endgame
5) clearly communicate state throughout the project
6) be careful what you ask valuable people to work on, because if you burn them out you have lost a valuable person
7) losing key people during the release can really be hard to recover from
8) be careful with team chemistry.

Posted by Chris at 03:00 PM | Comments (0)

Specs, planning and software

"Detailed plans usually fail, because circumstances inevitably change" is a quote from Karl von Clausewitz.
One should set a few clear, overarching goals. Then people are free to seize opportunities to further those goals.

Posted by Chris at 02:55 PM | Comments (0)

Self Organizing Systems for Software

The great challenge of building a software company or company is creating a self organizing system, that heals itself , innovates and produces without top-down control. The job of management is selection, branching and pruning not top-down direction.

Posted by Chris at 02:53 PM | Comments (0)

Programming as evolution

Writing software is more like evolution. A great testbed becomes the fitness function. It’s not top down design, its bottom up. Changes are made, introduced to the test environment and either live or die. The testbed is the fitness function, your app is as good as your environment is harsh.

Posted by Chris at 02:52 PM | Comments (0)

Finishing a project

When I was young I was drawing a picture. I kept bringing the picture to my mother and she said it looked great. Finally after the tenth time (or so). She said “a great artist knows when the picture is finished”. I always think about this when I get the urge to tinker endlessly on a project.

Posted by Chris at 02:51 PM | Comments (0)

XML Prefixes and Namespaces

Arguably namespaces and encodings are the two most complex beasts to handle when processing XML. Namespaces are actually a graft onto the original XML 1.0 specification see XML 1.0 and namespaces. The namespace spec changed the allowable names in an xml document to allow you to define a namespace. Originally an XML name could have any number of colons, for example "x:a:chris" was a legal name. The namespace spec changed that to allow only a single colon. For example "x:chris" would create a name that had a localPart "chris" and the namespace uri that was bound to "x".

The following fragment illustrates some things that you can do with namespaces:

<doc xmlns="myNamespace" xmlns:pre="myNamespace" xmlns:pro="myNamespace">
<pro:A pre:a="foo" b="foo">
<B xmlns="" pro:a="foo" xmlns:pro="bar"/>

The following output (from the stream API) indicates the namespace of the elements in brackets. Some things to note, if you undeclare the default namespace the element that contains that redeclaration is not in the default namespace. Also you can use a namespace declaration in an attribute list before you declare it.

START_ELEMENT [<['myNamespace']:doc xmlns="myNamespace" xmlns:pre="myNamespace" xmlns:pro="myNamespace">]
START_ELEMENT [<['myNamespace']:pro:A ['myNamespace']:pre:a="foo" b="foo">]
START_ELEMENT [<['myNamespace']:C>]
START_ELEMENT [<B xmlns="" xmlns:pro="bar" ['bar']:pro:a="foo">]
END_ELEMENT [</['myNamespace']:C>]
END_ELEMENT [</['myNamespace']:pro:A>]
END_ELEMENT [</['myNamespace']:doc>]

The first thing to note is that namespaces are scoped. That is, you can rebind them in the scope of the startelement that declared them. Anything that references the prefix will get bound to the new uri. This is to allow documents to be nested inside other documents and not have their namespaces change. The default namespace is attached to anything that is not specifically bound with a prefix. You can undeclare the default namespace by binding it to "". You can also have multiple prefixes bound to the same uri. This creates situation where unless you know the prefix used to bind the uri to the local name you can never recreate the document.

Attributes and Namespaces

Many people think that attributes are automatically put into the namespace of the element that declares the attribute. This isn't true. Attributes are in the default namespace (if it is active) or in the namespace that is bound to the prefix that is declared.


Prefixes are part of the XML Infoset http://www.w3.org/TR/xml-infoset/#infoitem.element

They are defined here http://www.w3.org/TR/REC-xml-names/#ns-qualnames

They are explicitly referenced in the XPath API http://www.w3.org/TR/xpath#predicates

Why do prefixes matter?

1) Philosophical: prefixes are explicitly part of the the XML Infoset. Therefore parsers/processors should be required to always report them.

2) A large majority of XML applications actually use prefixes. It is extremely common for developers to utilize prefixes in their applications, even if they strictly don't need to or shouldn't. It's treated like an alias for the namespace URI. (Of course, it's not a good idea for developers to treat it as an alias since the prefix is only properly valid in it's declared scope, but still some do). Therefore, since prefixes are widely used, it doesn't make sense to burden the majority of developers with a prefix-free QName for the minority of developers who in practice don't use prefixes.

Prefixes and Canonicalization


4.4 No Namespace Prefix Rewriting The C14N-20000119 Canonical XML draft described a method for rewriting namespace prefixes such that two documents having logically equivalent namespace declarations would also have identical namespace prefixes. The goal was to eliminate dependence on the particular namespace prefixes in a document when testing for logical equivalence. However, there now exist a number of contexts in which namespace prefixes can impart information value in an XML document. For example, an XPath expression in an attribute value or element content can reference a namespace prefix. Thus, rewriting the namespace prefixes would damage such a document by changing its meaning (and it cannot be logically equivalent if its meaning has changed).

More formally, let D1 be a document containing an XPath in an attribute value or element content that refers to namespace prefixes used in D1. Further assume that the namespace prefixes in D1 will all be rewritten by the canonicalization method. Let D2 = D1, then modify the namespace prefixes in D2 and modify the XPath expression's references to namespace prefixes such that D2 and D1 remain logically equivalent. Since namespace rewriting does not include occurrences of namespace references in attribute values and element content, the canonical form of D1 does not equal the canonical form of D2 because the XPath will be different. Thus, although namespace rewriting normalizes the namespace declarations, the goal eliminating dependence on the particular namespace prefixes in the document is not achieved.

Moreover, it is possible to prove that namespace rewriting is harmful, rather than simply ineffective. Let D1 be a document containing an XPath in an attribute value or element content that refers to namespace prefixes used in D1. Further assume that the namespace prefixes in D1 will all be rewritten by the canonicalization method. Now let D2 be the canonical form of D1. Clearly, the canonical forms of D1 and D2 are equivalent (since D2 is the canonical form of the canonical form of D1), yet D1 and D2 are not logically equivalent because the aforementioned XPath works in D1 and doesn't work in D2.

Note that an argument similar to this can be leveled against the XML canonicalization method based on any of the cases in the Limitations, the problems cannot easily be fixed in those cases, whereas here we have an opportunity to avoid purposefully introducing such a limitation.

Applications that must test for logical equivalence must perform more sophisticated tests than mere octet stream comparison. However, this is quite likely to be necessary in any case in order to test for logical equivalencies based on application rules as well as rules from other XML-related recommendations, working drafts, and future works.

Posted by Chris at 02:24 PM | Comments (0)

September 01, 2004

Learning Python

So I finally decided to learn python. There are some funny things about python. 1) Whitespace matters; 2) It's really hard to figure out what a method returns 3) In a classes methods you explicitly have to declare and use the self reference (like this). Other than these stranged things python is a lot of fun. I was able to write a program to invoke the google search api in about 15 mins, so that was fun and productive.

Posted by Chris at 10:31 PM | Comments (0)