Andreas Almer: Improving data quality in XML documents of desktop applications (Slides, in German)
Andreas works in a DaimlerChrysler lab. The problem: In specialised areas, like the automobile industry, XML documents are quite often directly exchanged between end users. Data quality then often suffers because there’s no strong validation in the chain.
He looks at different XML schema languages. DTDs are simple, but lack important features. XML Schema is powerful. Relax NG has a simple syntax and regular expressions, but the syntax leads to ugly deep nesting. Schematron is simple, but complex schemas are too much work. Examplotron is self-explaining, but cardinalities and datatypes are hard [?].
To help users, a simple, easy to learn, self-explaining and hand-writable schema language is needed.
He proceeds to introduce a new schema language. There are only two elements, <element> and <attribute>. Names, cardinalities, types etc are in attributes. There’s an assertion feature for simple rules. There’s comments for validation rules. Lots of examples in the last few slides.
[I think the validation rules are nice. They use XPath/XForms style expressions.]
Q: Why not get rid of the XML syntax to make hand writing easier? Because we live in a world of XML tools and so it’s good to have the schema in XML too. But it would be good for end users.