cygri’s notes on web data

Sony Ericsson K750i

Posted on August 29, 2005 by Richard Cyganiak

The phone situation is sorted out. I’m proud owner of a Sony Ericsson K750i now. It’s a nice phone.

The camera is quite impressive (two megapixels) and will make my cheap Konica-Minolta x31 digicam obsolete. It’s quite cleverly done — you turn the phone by 90 degrees after sliding back the lens cover, and hold it like a small camera. The menu and other interface elements on the display are also rotated by 90 degrees when the phone is in camera mode.

The UI seems a bit weird after using Nokias and Motorolas for years, but I’ll get used to it. Syncing to iCal and Address Book via Bluetooth works without a hitch, as does USB file transfer and I can use the phone as a remote for iTunes and Keynote thanks to Salling’s Remote Basics.

Some things don’t work yet. I can’t use the phone as a Bluetooth GPRS modem with my Powerbook, and making calls and sending SMS from Address Book doesn’t work either.

Still, I’m quite happy with this purchase.

I’ve put the number into the blog’s sidebar, by the way.

Posted in General | Comments Off

Ruby one-liners

Posted on August 27, 2005 by Richard Cyganiak

Martin Fowler shows off what you can do with collections and closures in Ruby. I am quite impressed. I already love PHP over Java for its succinctness. Ruby seems to be even better:

managers, plebs = employees.partition{|e| e.manager?}

The same code in PHP 4 would look like this:

foreach ($employees as $employee) {
    if ($employee->isManager()) {
        $managers[] = $employee;
    } else {
        $plebs[] = $employee;
    }
}

And in Java 1.4:

List managers = new ArrayList();
List plebs = new ArrayList();
Iterator it = employees.iterator();
while (it.hasNext()) {
    Employee employee = (Employee) it.next();
    if (employee.isManager()) {
        managers.add(employee);
    } else {
        plebs.add(employee);
    }
}

Quite an improvement, eh? It seems like Ruby can eliminate a large percentage of iterate-over-collection code, and in a way that seems more natural to me than the equivalent in functional languages like Haskell, and plainly more beautiful than Perl.

I’ll sit down with a Ruby book one of these evenings.

Posted in General | Comments Off

Slides: “SPARQL and relational databases”

Posted on August 26, 2005 by Richard Cyganiak

Slides: SPARQL and relational databases (PDF, 224k)

These are the slides from a short talk I gave two days ago at HP Labs. It’s a rough summary of the work I did there. I will talk more about this stuff as my writeups proceed toward publishing (as HP Labs tech reports).

Posted in General, Semantic Web | Comments Off

Back in Berlin

Posted on August 26, 2005 by Richard Cyganiak

After five months at HP Labs Bristol I’m now back in Berlin. I’ll be busy for a few days catching up with all the things that have accumulated here, but then a lazy student lifestyle waits to be re-adopted for another couple of months. First I need to sort out the mobile phone situation — I don’t have a handset that works with my German contract, and using an English prepaid phone in Germany is obviously not the most cost-effective solution. Fortunately, email setup was as easy as getting out the laptop and asking Anja for the ID of her wireless network.

She is putting me up in a corner of her tiny one-room flat while we are looking for something bigger to move into. We are in the middle of hard negotiations over cupboard shelf space, desk real estate and AC outlet use.

Good to be back.

Posted in General | Comments Off

sparql2sql: A query engine for SPARQL over a Jena triple store

Posted on July 27, 2005 by Richard Cyganiak

Google, please pick this up: sparql2sql is a SPARQL implementation on top of a triple store in a relational database. It’s reasonably fast because it translates SPARQL queries directly into SQL. This is the result of the first half of my internship at HP Labs.

Posted in General, Semantic Web | Comments Off

New desktop background

Posted on July 26, 2005 by Richard Cyganiak

Posted in General | 18 Comments

Redland contexts

Posted on July 26, 2005 by Richard Cyganiak

How contexts work in Dave Beckett’s Redland RDF library:

When a triple is created, then any RDF node (URI, blank node, literal) can be attached to the triple. This becomes the context node.
A triple can be added to a graph multiple times with different contexts (Redland graphs are bags of triples, not sets).
A context node can be specified when searching for triples matching a pattern.
A context node can be specified when triples are removed
I’m not sure what happens when triples are searched or removed and no context node is specified — does it match triples with context? I suppose yes, but the docs don’t tell.
I’m not sure if two statements with different contexts are considered equal. I suppose yes, but the docs don’t tell.

I believe this is largely compatible with SPARQL’s RDF datasets, except for two things:

Graph names can only be URIs, while contexts can be any kind of RDF node.
Blank nodes can never be shared between multiple graphs, but can be shared between contexts.

Redland graphs have always allowed duplicate triples. This design predates the Working Group decision that graphs are sets of triples. I wonder how the design of contexts would have turned out if Redland graphs were sets.

Posted in General, Semantic Web | Comments Off

RDF and semantic equality

Posted on July 26, 2005 by Richard Cyganiak

This is a braindump about an issue that came up repeatedly in recent discussions among Jena developers.

In basic RDF, these two triples are semantically different, that is, they don’t mean the same thing:

:richard :age "26"^^xsd:int .
:richard :age "+00026"^^xsd:int .

That’s because basic RDF uses syntactic equality for literals. Two literals are equal if and only if they have the same literal value, language tag, and datatype. But this is not useful for most applications.

Enter datatype semantics. This part of RDF semantics defines how all the datatype stuff in RDF works. It also defines the built-in XSD types.

With datatype semantics, each of these pairs is semantically identical:

"foo" and "foo"^^xsd:string
"1"^^xsd:int and "01"^^xsd:int
"1"^^xsd:int and "1"^^xsd:unsignedByte

Also, this is legal:

:foo rdfs:domain xsd:unsignedByte .
:x :foo "1"^^xsd:int .

In implementation, there are two issues: Equality testing and roundtrippling.

Equality testing is nontrivial because of all sorts of corner cases, rounding issues and so forth. There’s a section in the Best Practices WG’s XML Schema Datatypes in RDF and OWL document on this.

Roundtripping: Users want to get stuff out of the system in the same way they put it in; they don’t like it if all the datatypes have changed, even if the semantic content is still the same. Tools have two options (from a user expectation point of view; the spec leaves more freedom):

1. Output literals with exactly the same lexical form and datatype as in the input
2. Output literals with the same datatype and a normalized lexical form.

The second option works for XSD datatypes because they all come with canonicalization algorithms, but there’s no requirement to provide one with an RDF datatype.

There are some additional pitfalls in this area. These two triples are semantically identical if XSD datatypes are interpreted:

:richard :name "Richard" .
:richard :name "Richard"^^xsd:string .

But these are not identical:

:richard :age "26" .
:richard :age "26"^^xsd:int .

This is a special rule for plain literals (without language tag) and xsd:string.

Another pitfall:

:knows rdfs:domain :Person .
:richard :knows :paolo .

implies (under RDFS semantics):

:paolo rdf:type :Person .

But this:

:age rdfs:domain xsd:int .
:richard :age "26" .

does not imply

:richard :age "26"^^xsd:int .

In fact, the “26” is an error (a datatype clash); you have to specify the datatype. In other words: rdfs:domain implies rdf:type, but not datatype.

Posted in General, Semantic Web | Comments Off

Andy on JavaCC vs. AntLR

Posted on July 26, 2005 by Richard Cyganiak

For future reference … Andy Seaborne on jena-dev:

JavaCC creats a self contained set of classes to compile – one reason I use it for RDQL and SPARQL – Antlr for example needs its own runtime and that creates problems with different versions (to run Jena with Groovy, just omit antlr.jar from Jena).

Posted in General | Comments Off

Wow! WordPress 1.5 rocks.

Posted on July 24, 2005 by Richard Cyganiak

I just upgraded to the latest WordPress and I’m im…uh…pressed. The new default template is so much better than the old one, which as of today is still in use on my German blog. It’s true: Presentation matters.

The upgrade was surprisingly painless — just delete all files except wp-config.php from the server, upload the new files, run wp-admin/upgrade.php and you’re done. This page on the WordPress wiki was helpful.

As of today, I’ll stop using categories. Categorizing new posts adds too much cognitive overhead, and I didn’t see much payoff.

Posted in General | Comments Off

Sony Ericsson K750i

Ruby one-liners

Slides: “SPARQL and relational databases”

Back in Berlin

sparql2sql: A query engine for SPARQL over a Jena triple store

New desktop background

Redland contexts

RDF and semantic equality

Andy on JavaCC vs. AntLR

Wow! WordPress 1.5 rocks.

About me

Links

Recent Posts

Archives