Namespaces in queries, part 2

One of the nice things about blogging is that you get people to review your ideas, for free.

Yesterday I claimed that you don’t need URI prefixes in RDF queries. Instead of writing this:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX doap: <http://usefulinc.com/ns/doap#>
SELECT DISTINCT ?projectName ?personName
WHERE { 
  ?person foaf:name ?personName .
  ?person doap:project ?project .
  ?project doap:name ?projectName .
}

one could write this:

SELECT DISTINCT ?projectName ?personName
WHERE {
  ?person :name ?personName .
  ?person :project ?project .
  ?project :name ?projectName .
}

and have the undeclared default namespace match any namespace in the data. It’s shorter, it’s more readable, no more hunting around for copy and pasting those namespace URIs, what’s not to like?

Well, everybody hates the idea. See the comments to the piece linked above, and Ora’s rant and comments there. So I’m probably wrong.

I found most of the cited reasons pretty lame though, ranging from dogma to tortured SQL analogies to performance concerns. Some good bits:

Evan:

Lumping [properties from different namespaces] together should only be done if the user explicitly specifies that it should be done,

Spot on: an optional, well-defined and compact way for specifying that the user wants them lumped together. That’s what I want.

and it is likely there would be far more accidental collisions than purposeful collisions.

I disagree about the prediction, but it’s hard to prove either way.

Richard Newman:

I assume the proposal is just the fevered ramblings of a man sick of writing

PREFIX foaf: <http ://xmlns.com/foaf/0.1/>

every time he wanted to query on names.

My answer: environment support.

While Richard may be quite right about my motivation, I don’t see environment support happening. SPARQL is usually entered into web forms, text editors and IDEs. I’m not aware of any support in any of these environments (well, except in D2R Server (example)), and I’m not going to wait two years for a solution.

Ora and drewp pointed out that, if I don’t want to query for full URIs, I should query for rdfs:labels. That’s a good point. A query language that lets me do

SELECT ?personName ?projectName WHERE
?project a project .
?project name ?projectName .
?project developer ?person .
?person name ?personName .
?person a person .

would indeed be cool. This would be halfways between SPARQL and NLP projects like Ginseng. The downside is that one would have to design an all-new language, and apart from the namespace issue, I like SPARQL just fine.

Danny mentioned microformats, where namespaces are unnecessary because the community has to agree on a schema before it can be used. But I don’t want to change anything on the data side; URIs for properties and classes are great. That doesn’t necessarily mean we need to do the same on the query side.

Finally, Henry:

I think you just posted this to get attention ;-)

Uh, no. Though it certainly worked ;-)

A question to those still interested in the discussion: 100% of widely deployed RDF vocabularies follow the convention of namespace plus mnemonic local parts. Why is it wrong to exploit a universally accepted convention in a query language?

This entry was posted in General, Semantic Web. Bookmark the permalink.

3 Responses to Namespaces in queries, part 2

  1. Laurens: Yes, RDF uses full URIs, but there is a clear convention for separating URIs into namespace and localname: Everything up to the last non-NCName character goes into the namespace. Sesame, for example, stores URIs internally as namespace/localname pairs.

    You say that a change in the data might break the query result. That’s a good point. This would cause problem for queries that will be re-executed multiple times over a longer period of time, so for that kind of query it’s certainly safer to use explicit namespaces.

    But it would not cause problems for the kind of queries I’m interested in: one-off, interactive queries. For these, number of characters typed matters much more, and if the results are wrong then the user can correct the query right away.

    So I guess my suggestion only makes sense in this particular context. Thanks for your help in clarifying this.

  2. I think that the main risk is that you are depending on current content of the store conforming to some constraints. If at some point the store is expanded in some other part of the software that uses the same store (e.g. a fathers:name property is added), then it will suddenly start to get buggy. After all, in good modular programming models, a change in one module of a software should not require a change in another. And these bugs will be difficult to hunt down, because no changes were made to that relevant part of the software.

    Also, testing methods would be much more complex. With this, you would basically need to query the entire store and check the results just to make sure that in all cases there is no naming conflict, whereas if you’re using namespaces (and the store integrity is dependable) there can never be a conflict. Also, you would need to repeat this check for everything that adds new properties to the database.

    So it doesn’t seem like a very good idea. And those namespaces are copy/paste anyway, right. I mean, as a programmer there are more tedious tasks than just that (debugging for example :)).

    ~Grauw

  3. Also, I don’t really see how this would work, given that RDF is using full (concatenated) URIs, not namespace/key pairs. A substring match on the last part of the string? Then ”˜me’ would also match ”˜name’, ”˜name’ would also match ”˜fullname’, etc.

    ~Grauw

Comments are closed.