cygri’s notes on web data

Debugging Semantic Web sites with cURL

Posted on February 6, 2007 by Richard Cyganiak

Here at our group we spend a lot of time preaching the benefits of dereferenceable URIs. We often want to know if a certain URI supports all the fancy HTTP tricks that are the cornerstones of RDF publishing best practices, like 303 redirects and content negotiation.

My tool of choice for this is cURL, a command-line HTTP client that makes a useful addition to any Semantic Web developer’s toolbox. This tutorial shows how to use cURL to test Semantic Web URIs and to diagnose some common problems.

Getting cURL: Windows users can get cURL binaries from here, the first “non-SSL binary” version will work. Find curl.exe in the archive and drop it somewhere on the path, e.g. in C:\Windows. On Mac OS X and most Linux versions cURL is pre-installed.

To test cURL, open a command prompt and invoke

curl http://example.com/

You should see the HTML source code of the Example Web Page.

So let’s see some of the things we can do with cURL.

Checking content types: On the Web, content types are used to distinguish between content in different formats, e.g. human-readable HTML (Content-Type: text/html) and machine-readable RDF/XML data (Content-Type: application/rdf+xml). When you request a URI, the server sends the content type and other HTTP headers along with the response. Many Semantic Web clients don’t work properly unless RDF content is served with the correct content type.

To check this with cURL, use the -I parameter. This will show the HTTP headers sent by the server.

curl -I http://sites.wiwiss.fu-berlin.de/suhl/bizer/foaf.rdf

The URL is the FOAF file of Chris Bizer. Result:

HTTP/1.1 200 OK
Content-Length: 13746
Content-Type: application/rdf+xml
Last-Modified: Thu, 18 Jan 2007 10:27:22 GMT
Accept-Ranges: bytes
ETag: "bf3d723deb3ac71:54d"
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Tue, 06 Feb 2007 10:52:51 GMT

The important line is the Content-Type header. We see that the file is served as application/rdf+xml, just as it should be. If we would see text/plain here, or if the Content-Type header was missing, then the server configuration would need fixing.

Checking for 303 redirects: RDF publishers often use 303 redirects to distinguish between URLs for Web documents and URIs for Semantic Web resources. The idea is that when I fetch the URI of a non-document thing (e.g. a person or country or OWL class), then the response will send me to the location of a document describing the thing. Let’s see if the FOAF vocabulary correctly implements 303 redirects. What happens if I fetch foaf:knows?

curl -I http://xmlns.com/foaf/0.1/knows

Response:

HTTP/1.1 303 See Other
Date: Mon, 05 Feb 2007 19:09:55 GMT
Server: Apache/1.3.37 (Unix)
Location: http://xmlns.com/foaf/0.1/
Content-Type: text/html; charset=iso-8859-1

There’s the 303 status code, and the Location header gives the URL of the document that describes the foaf:knows property. In this case the FOAF specification.

If we got a 200 OK status code instead, then the URI would need fixing because foaf:knows is an RDF property and not a document.

Content negotiation: Good Semantic Web servers are configured to do another trick: They will redirect Semantic Web browsers to RDF documents, while plain old Web browsers are sent to HTML documents. To simulate a Semantic Web browser, we have to send an HTTP header Accept: application/rdf+xml along with the request. This is done using cURL’s -H parameter:

curl -I -H "Accept: application/rdf+xml" http://www4.wiwiss.fu-berlin.de/dblp/resource/person/103481

Response:

HTTP/1.1 303 See Other
Date: Tue, 06 Feb 2007 11:23:55 GMT
Server: Jetty/5.1.10 (Windows 2003/5.2 x86 java/1.5.0_09
Location: http://www4.wiwiss.fu-berlin.de/dblp/sparql?query=DESCRIBE+%3Chttp%3A%2F%2Fwww4.wiwiss.fu-berlin.de%2Fdblp%2Fresource%2Fperson%2F103481%3E
Content-Type: text/plain

If we send the same request without the header, we get:

HTTP/1.1 303 See Other
Date: Tue, 06 Feb 2007 11:25:20 GMT
Server: Jetty/5.1.10 (Windows 2003/5.2 x86 java/1.5.0_09
Location: http://www4.wiwiss.fu-berlin.de/dblp/page/person/103481
Content-Type: text/plain

And checking the two locations we will find that the first one serves RDF/XML, while the second one serves HTML.

Summary: So here’s how to examine URIs with cURL.

Check the contents that a normal web browser will see:

curl <uri>

Check the response headers that a normal web browser will see:

curl -I <uri>

Check the contents that a Semantic Web browser will see:

curl -H "Accept: application/rdf+xml" <uri>

Check the response headers that a Semantic Web browser will see:

curl -I -H "Accept: application/rdf+xml" <uri>

You can’t tell if a URI will work on the Semantic Web just by opening it in a Web browser. But you can tell with cURL.

Posted in General, Semantic Web | 9 Comments

The Web in five minutes

Posted on February 4, 2007 by Richard Cyganiak

This excellent (in content and style) short video by Michael Wesch is making the rounds. It perfectly captures the essence of the Web at its state circa 2007.

I’d love to see this extended with another 30 seconds devoted to RDF. Or maybe not, because RDF on the Web is still more a vision than a reality, but we are getting there …

(And a whacky prediction: This kind of fast, visual propaganda flick will be the PowerPoint of the future.)

(via Christian Katzenbach)

(Oops, I got Michael’s name wrong, now fixed.)

Posted in General, Semantic Web | 3 Comments

Advogato adds FOAF support

Posted on February 4, 2007 by Richard Cyganiak

Free software community site Advogato now generates FOAF profiles for all users, thanks to work by Steve Rainwater.

Posted in General, Semantic Web | Comments Off

Now, how to make lists with a point?

Posted on January 30, 2007 by Richard Cyganiak

Yaron Koren comments on our dbpedia project:

I created a query to get a list (including image) of all extinct birds whose name contains the letter “d”. Does that seem like a pointless list? Well, it is, though on the other hand this is also, as far as I know, the first time in human history that someone could create such a list without doing any installing, computing, or any research on the actual subject matter.

Posted in General, Semantic Web | Comments Off

Test post, please ignore

Posted on January 29, 2007 by Richard Cyganiak

FeedBurner, please purge the feed cache …

Posted in General | 1 Comment

Call it Web and they will buy it

Posted on January 27, 2007 by Richard Cyganiak

Nick Gall, VP Gartner, complains that Middleware vendors simply slapped a “Web” label onto their overcomplicated products and, thanks to W3C’s blessing, managed to create yet another wave of hype:

Unfortunately, Web Services, at least the WS- style, are “Web” in name only. While WS- enables tunneling over HTTP (used merely as an XML message transport), in almost every important aspect, WS- violates (or at best ignores) the architectural principles of the Web as described in the W3C’s Architecture of the World Wide Web, Volume One and in Tim Berners-Lee’s personal design notes.

It is my position that the W3C should extricate itself from further direct work on SOAP, WDSL, or any other WS- specifications and redirect its resources into evangelizing and standardizing identifiers, formats, and protocols that exemplify Web architectural principles. This includes educating enterprise application architects how to design “applications” that are “native” web applications.

And WS-* is not the only case where stuff that has almost nothing to do with the Web got hyped after getting a “Web” label slapped on. (Web Ontologies?)

Meanwhile, Web innovation continues to happen elsewhere.

(via Bill de hÕra)

Posted in General, Semantic Web | 1 Comment

Ohloh: an open source directory

Posted on January 24, 2007 by Richard Cyganiak

Recently I came across ohloh.net, a Web 2.0-ish directory of open source projects. It seems to aggregate information from at least SourceForge, Freshmeat, user-provided RSS feeds, and possibly other sources.

The most interesting aspect: To gather information about a project, Ohloh also connects to its CVS repository and displays statistical and historical information about the project’s development, and even generates information on individual contributors. Thus, it’s a public website that provides a hosted service somewhat similar to tools like StatCVS.

A good example is the Ohloh page for StatCVS itself. Here’s the page about me as a contributor to StatCVS – Ohloh determines that I have 1.2 years of Java experience and four months of CSS experience. Neat, it uses SIMILE Timeline.

The developers also have a blog.

So, what can Ohloh do for people involved with open source? I think the clearest story is this: People who want to quickly evaluate a project can use Ohloh to get a one-stop overview. For example, we learn from the StatCVS page that the project has “5 active developers”, “increasing development activity”, an “established codebase”, and a project cost of $139K (huh?).

In the future Ohloh could add some more sources of data (project mailing lists, issue trackers), and could become a one-stop dashboard for people involved with the project, to stay on top of day-to-day developments. A bit like a hosted version of Trac.

I see this as one more unexpected advantage of the open source development process: Since the source code and other data is available on the Internet, third parties can built services that add value to the development process.

Posted in General | Tagged StatCVS | 1 Comment

Semantic Web tools list in Exhibit

Posted on January 23, 2007 by Richard Cyganiak

Mike Bergmann has set up a list of Semantic Web tools that can be viewed, searched and filtered through SIMILE Exhibit. Some background on how he did it here – it combines Google Spreadsheets, WordPress and Exhibit. Pretty cool.

If you haven’t seen Exhibit in action yet, then check it out, and keep in mind that everything happens on the client side, and the data to feed the mashup can come from just about anywhere.

Update: David Huynh, the creator of Exhibit, is quite excited about Mike’s work. Is this really “the beginning of something great?”

Posted in General, Semantic Web | Comments Off

Endless Bar (tomorrow, Friday, January 19th)

Posted on January 18, 2007 by Richard Cyganiak

Tomorrow the St. Oberholz in Berlin will will join the Endless Bar. Arne has the details.

Come join us for a drink between 7.30 and 9.30pm, either personally in Berlin, or virtually if you are at any other place with booze, music, an internet connection and a webcam. (We use Skype and iChat; my contact details are in the sidebar.)

Posted in General | Comments Off

Idea of the week

Posted on January 11, 2007 by Richard Cyganiak

David Weinberger:

In fact, perhaps we could use a microformat for technical problems and solutions.

The first thing I do when I get some funky error message is to google for it. Even for the most arcane problem, there’s a forum thread somewhere that will shed some light on the issue. A little more structure could add a lot of usefulness to that kind of information.

Posted in General, Semantic Web | 2 Comments

Debugging Semantic Web sites with cURL

The Web in five minutes

Advogato adds FOAF support

Now, how to make lists with a point?

Test post, please ignore

Call it Web and they will buy it

Ohloh: an open source directory

Semantic Web tools list in Exhibit

Endless Bar (tomorrow, Friday, January 19th)

Idea of the week

About me

Links

Recent Posts

Archives