RDF and semantic equality

This is a braindump about an issue that came up repeatedly in recent discussions among Jena developers.

In basic RDF, these two triples are semantically different, that is, they don’t mean the same thing:

:richard :age "26"^^xsd:int .
:richard :age "+00026"^^xsd:int .

That’s because basic RDF uses syntactic equality for literals. Two literals are equal if and only if they have the same literal value, language tag, and datatype. But this is not useful for most applications.

Enter datatype semantics. This part of RDF semantics defines how all the datatype stuff in RDF works. It also defines the built-in XSD types.

With datatype semantics, each of these pairs is semantically identical:

  • "foo" and "foo"^^xsd:string
  • "1"^^xsd:int and "01"^^xsd:int
  • "1"^^xsd:int and "1"^^xsd:unsignedByte

Also, this is legal:

    :foo rdfs:domain xsd:unsignedByte .
    :x :foo "1"^^xsd:int .

In implementation, there are two issues: Equality testing and roundtrippling.

Equality testing is nontrivial because of all sorts of corner cases, rounding issues and so forth. There’s a section in the Best Practices WG’s XML Schema Datatypes in RDF and OWL document on this.

Roundtripping: Users want to get stuff out of the system in the same way they put it in; they don’t like it if all the datatypes have changed, even if the semantic content is still the same. Tools have two options (from a user expectation point of view; the spec leaves more freedom):

1. Output literals with exactly the same lexical form and datatype as in the input
2. Output literals with the same datatype and a normalized lexical form.

The second option works for XSD datatypes because they all come with canonicalization algorithms, but there’s no requirement to provide one with an RDF datatype.

There are some additional pitfalls in this area. These two triples are semantically identical if XSD datatypes are interpreted:

:richard :name "Richard" .
:richard :name "Richard"^^xsd:string .

But these are not identical:

:richard :age "26" .
:richard :age "26"^^xsd:int .

This is a special rule for plain literals (without language tag) and xsd:string.

Another pitfall:

:knows rdfs:domain :Person .
:richard :knows :paolo .

implies (under RDFS semantics):

:paolo rdf:type :Person .

But this:

:age rdfs:domain xsd:int .
:richard :age "26" .

does not imply

:richard :age "26"^^xsd:int .

In fact, the “26” is an error (a datatype clash); you have to specify the datatype. In other words: rdfs:domain implies rdf:type, but not datatype.

This entry was posted in General, Semantic Web. Bookmark the permalink.