Erik Wilde, ETH Zürich: Towards Conceptual Modeling for XML (Slides)
XML schemas don’t contain enough semantic information. Too much meaning is only present in the documentation. Erik wants a conceptual model for modeling with XML. Like the ER model in the database world, but better suited to XML — hierarchical and referential.
This becomes more important as XML moves from a pure data exchange format to an integral part of many applications. XML is moving from a library thing right into the core of programming languages.
No one can understand an XML schema just from looking at the source. A higher-level visual notation would be good.
Erik cites a paper from WWW2005 where the authors had collected lots of XML schemas from the web and analyzed them. Main findings: Either they were broken, or they didn’t use more than the basic DTD stuff. I believe it’s [this paper.]
There are two ways to relate entities in XML: hierarchical (nesting) and referential (using IDs).
He reviews the existing approaches. Most look similar to ER models with a few differences like relationsips going from attributes to entities (taking into account XML’s hierarchical structure). All have some limitations: target specific schema language, don’t support mixed content, weak formal foundations etc.
He wants to create a better model. It’s work in progress. He has worked out a list of requirements and is looking for feedback.