cygri’s notes on web data

Smushing vs. untangling ambiguous tags

Posted on September 22, 2005 by Richard Cyganiak

How do we find out that two bits of RDF describe the same resource? The usual approaches are either to agree on a common URI scheme (which is often impractical), or to use inverse functional properties and smushing (which is complex and brittle). Phil Dawes describes another approach:

The recent folksonomy phenomenon has shown us that it is possible for serendipidous linking to happen on a large scale. This is achieved by leveraging existing real-world semantic grounding in shared (and well known) terms, and then requiring that clients do their own work in using context to disambiguate terms. […] Instead of having lots of unconnected data that must be painstakingly merged centrally [which incidently is what’s going on now when we attempt to convert other data to RDF, and when we create owl mapping statements], you have the opposite problem: lots of over-linked data which the consumer must disambiguate (and choose which links to follow) based on an operating context.

Phil goes on to say that this disambiguation is much simpler than the task of joining things together. Ask for the company tagged “apple” or the fruit tagged “apple”.

I’m not sure what to think about this. The idea is “un-semwebby”. When I think about a “working” semantic web, I envision an entirely automated system. Got a new bit of RDF? Just push it into the pipe, and the system will digest it and index it and hash it and p2p it and sparql it and everybody can use it. But maybe another level of human decision-making and intervention is necessary. (Just as a stop-gap?)

By adding tags, you could publish your RDF into a certain “pool” of information, which is watched by those parties that are interested in the tag. The great thing about tags is that they are a social solution to a technical problem. Making web resources findable seems like a technical issue, but assigning and discovering tags is a purely social problem.

Maybe it’s just me, but the idea of a del.icio.us where you tag RDF files or RDF services or RDF resoures, with an API, sounds like an awfully sweet idea.

Posted in General, Semantic Web | Comments Off

Hell froze over! Slashdot switches to valid HTML and CSS

Posted on September 22, 2005 by Richard Cyganiak

Hell froze over!

After 8 years of my nasty, crufty, hodge podged together HTML, last night we finally switched over to clean HTML 4.01 with a full complement of CSS. While there are a handful of bugs and some lesser used functionality isn’t quite done yet, the transition has gone very smoothly.

Good to see this happening. Slashdot is the epicenter of the geek world, and web standards advocates have complained about its sloppy tagsoup HTML coding for years. Ugly <font> tags everywhere, and we’re in 2005!

Besides increasing Slashdot’s nerd credibility, this will probably reduce their bandwith usage by a nice percentage, and user-made alternative stylesheets will surely start to crop up soon.

Posted in General | Comments Off

Don “Flaming” Knuth

Posted on September 21, 2005 by Richard Cyganiak

Read this. It’s priceless.

Look folks, I know that software rot (sometimes called “progress”) keeps growing, and backwards compatibility is not always possible. At one point I changed my TeX78 system to TeX82 and refused to support the older conventions. […] But in this case I see absolutely no reason why system people who are supposedly committed to helping the world’s users from all the various cultures are suddenly blasting me in the face and telling me that you no longer support things that every decent browser understands perfectly well.

(via Danny)

Posted in General | Comments Off

[bxmlt2005] Meike Klettke

Posted on September 14, 2005 by Richard Cyganiak

Meike Klettke, Uni Rostock: XML schema evolution and incremental validation (Slides, in German)

XML documents and document schemas change over time, that’s just a fact. Compare to SQL databases: Data is updated all the time, schema changes happen, columns are added to tables etc.

Update languages for XML are far from being standardized, and evolution of XML schemas hasn’t been considered much.

Especially schema changes are hard because there’s instance data that must be updated in order to be valid according to the new schema.

[skip over hard-to-summarize details]

(The simpler alternative pushed by vendors is schema versioning. Have multiple versions of schemas, validate against all (must match one), and downstream tools like XSLT scripts must be able to cope with multiple versions.)

Oracle’s XML component supports a limited form of schema evolution. MS SQL Server supports schema versioning.

Posted in General | Comments Off

[bxmlt2005] Daniel Fötsch

Posted on September 14, 2005 by Richard Cyganiak

Daniel Fötsch, Uni Leipzig: Operator hierarchy concept for XML transformation

Daniel reviews transformation methods for XML. You can do it on the plain text level with regular expressions etc, with APIs like DOM and SAX, with specialized languages like XSLT, STX, XUL and XUpdate. He talks about this last category.

The idea behind “operator hierarchies” is to create low-level transformation files automatically from high-level transformation files. You can write transformations on a higher level of abstraction, which makes the transformation file shorter and more readable and maintainable. Then it’s transformed through an “operator file”, and expanded into a low-level transformation file. In fact, you can have several levels of this operator expansion, starting with domain-specific operators, then generic operators, then low-level operators.

So you could have something like this:

   .(much XML)
   <isif condition="...">
       ... (XML) ...
   <iselse/>
       ... (XML) ...
   </isif>
   (more XML)

And this could be expanded to a much longer XSLT fragment.

Mostly it’s about avoiding repetition in XSLT files by factoring out common code.

Posted in General | Comments Off

[bxmlt2005] Andreas Almer

Posted on September 14, 2005 by Richard Cyganiak

Andreas Almer: Improving data quality in XML documents of desktop applications (Slides, in German)

Andreas works in a DaimlerChrysler lab. The problem: In specialised areas, like the automobile industry, XML documents are quite often directly exchanged between end users. Data quality then often suffers because there’s no strong validation in the chain.

He looks at different XML schema languages. DTDs are simple, but lack important features. XML Schema is powerful. Relax NG has a simple syntax and regular expressions, but the syntax leads to ugly deep nesting. Schematron is simple, but complex schemas are too much work. Examplotron is self-explaining, but cardinalities and datatypes are hard [?].

To help users, a simple, easy to learn, self-explaining and hand-writable schema language is needed.

He proceeds to introduce a new schema language. There are only two elements, <element> and <attribute>. Names, cardinalities, types etc are in attributes. There’s an assertion feature for simple rules. There’s comments for validation rules. Lots of examples in the last few slides.

[I think the validation rules are nice. They use XPath/XForms style expressions.]

Q: Why not get rid of the XML syntax to make hand writing easier? Because we live in a world of XML tools and so it’s good to have the schema in XML too. But it would be good for end users.

Posted in General | Comments Off

[bxmlt2005] Thomas Müller

Posted on September 14, 2005 by Richard Cyganiak

Thomas Müller: Passing XQuery sequence sections

Thomas talks about the roadmap of SQL with regard to XML support. (Slides, in German)

The current version of SQL (SQL 2003) has an XML datatype and some functions for converting between XML bits and relational data.

SQL 2007 will adapt the XQuery data model for the XML datatype and will allow embedding of XQueries into SQL (!).

SELECT C.Name, XMLQuery('$e//...' PASSING C.Procurement AS "e")
FROM Customer C
WHERE C.CustId > 2105

[The meat of the talk was over my head.]

Posted in General | 1 Comment

Why XML is the wrong technology for modeling information

Posted on September 14, 2005 by Richard Cyganiak

Very timely after Erik Wilde’s talk about conceptual models for XML, Elias Torres cites from Nature Biotech Journal:

The [above] problem originates from the limited expressiveness of the XML language. This claim may appear to contradict the often proclaimed ’self-descriptive’ nature of XML. But XML, designed as a language for messaging encoding, is only self-descriptive about the following structural relationships: containment, adjacency, co-occurrence, attribute and opaque reference. All these relationships “are indeed useful for serialization, but are not optimal for modeling objects of a problem domain.

This is the best articulation of the “RDF vs. XML” argument I’ve seen so far.

Posted in General, Semantic Web | Comments Off

Office 12 screenshots — ewww!

Posted on September 14, 2005 by Richard Cyganiak

Look at these Office 12 screenshots.

Microsoft imitates the Brushed Metal look of Mac OS X, just as Apple seems to be moving away from this look.

I just don’t understand why anybody would want their OS to look like a cheap ripoff of the previous version of someone else’s OS. Microsoft’s UI design team has no sense of ~~taste~~ fashion.

(via Signal vs. Noise)

Posted in General | Comments Off

[bxmlt2005] Erik Wilde

Posted on September 14, 2005 by Richard Cyganiak

Erik Wilde, ETH Zürich: Towards Conceptual Modeling for XML (Slides)

XML schemas don’t contain enough semantic information. Too much meaning is only present in the documentation. Erik wants a conceptual model for modeling with XML. Like the ER model in the database world, but better suited to XML — hierarchical and referential.

This becomes more important as XML moves from a pure data exchange format to an integral part of many applications. XML is moving from a library thing right into the core of programming languages.

No one can understand an XML schema just from looking at the source. A higher-level visual notation would be good.

Erik cites a paper from WWW2005 where the authors had collected lots of XML schemas from the web and analyzed them. Main findings: Either they were broken, or they didn’t use more than the basic DTD stuff. I believe it’s [this paper.]

There are two ways to relate entities in XML: hierarchical (nesting) and referential (using IDs).

He reviews the existing approaches. Most look similar to ER models with a few differences like relationsips going from attributes to entities (taking into account XML’s hierarchical structure). All have some limitations: target specific schema language, don’t support mixed content, weak formal foundations etc.

He wants to create a better model. It’s work in progress. He has worked out a list of requirements and is looking for feedback.

Posted in General | Comments Off

Smushing vs. untangling ambiguous tags

Hell froze over! Slashdot switches to valid HTML and CSS

Don “Flaming” Knuth

[bxmlt2005] Meike Klettke

[bxmlt2005] Daniel Fötsch

[bxmlt2005] Andreas Almer

[bxmlt2005] Thomas Müller

Why XML is the wrong technology for modeling information

Office 12 screenshots — ewww!

[bxmlt2005] Erik Wilde

About me

Links

Recent Posts

Archives