This is in response to a discussion in the comments to my recent post on geonames.org. I criticized the use of the same URI for concepts and documents in the Geonames RDF output; this post describes in detail how to fix that kind of issue.
The problem: Geonames uses URIs to identify places. For example, this is the URI for Berlin:
This URI identifies both a concept (the city of Berlin) and a document (which provides some information about that city). This is a problem because, as TimBL points out, the URI now both identifies something that is located in Germany and something that mentions the class Feature. This ambiguity can cause a lot of trouble down the road.
How to fix it: Use different URIs for the concept and the document.
Let’s say we keep the URI above for the document and pick a new URI for the concept. When I create an RDF link from my profile to my home town, for example, I would use the new URI.
But we have to set things up in a certain way: When we retrieve the concept URI, we want to get to the contents of the document! There are two ways to do that, and we have to pick one:
The hash approach: By adding a fragment identifier to the document URI, we get a new concept URI. For example, this could be the concept URI for Berlin:
Just as with HTML, a hash in the URI means that the part before the hash is the document to be retrieved, and the part after the hash identifies something within the document. Consequently, one gets the document when trying to retrieve the hash URI.
The 303 approach: Here we pick a completely new URI for the concept. For example:
(I’ve tried to pick a clean URI. Your URIs are your site’s prime real estate, it’s better not to clutter them.)
Whenever any HTTP URI is accessed, the web server responds with a status code, e.g. 200 for “OK, Here’s the page”, 404 for “Sorry, not found”, or 302 for “The document has temporarily moved to this other URI” (where the other URI is provided in a
Location: HTTP header; this is called a redirection).
Now we have to set up the server to respond with a 303 (“See Other”) status code, and put the document URI in the
Location: HTTP header. The client can fetch the
Location: URI to retrive the document.
Which one to pick? To be honest I don’t really know. The 303 approach has the disadvantage of requiring an additional HTTP request to fetch the redirected document, and it may be harder to set up. The hash approach has the disadvantage that it feels a bit hackish when there is just a single concept described in the document. Do whatever works for you.
But is it a problem at all? Some researchers still quarrel about this whole issue. Some think it’s no problem at all; other think that something completely different should be done. I’m a toolsmith, not a philosopher, and therefore try to avoid these debates. W3C has said that we should do the things above, and I’m happy to comply and move on to more important issues.
Background reading: This piece is getting way too long already and I’m tired; so I’ll leave it up to you to provide interesting linkage in the comments.