[juc] Bastian Quilitz – Federated queries with SPARQL

People have been talking about federated semantic web queries for a while. Here’s a working prototype …

Bastian is an intern at HP Labs. The idea is to answer one SPARQL query using data from multiple SPARQL endpoints. The individual endpoints have to describe their capabilities with a simple service description. The federation engine then can create a plan of how to split up the query, execute the parts on individual stores, and recombine the results.

Service descriptions include:

  • the service endpoint URL
  • information on what kind of queries the service can answer with good performance, based on predicates, e.g. “This service can answer queries about foaf:name and foaf:mbox.”
  • a selectivity function
  • whether the endpoints provides definitive information.

Query plans are optimized based on a cost function that mostly uses selectivity as a cost measure. (E.g. foaf:gender has low selectivity, foaf:name high selectivity, and doing high selectivity parts first is better. I wonder whether triple counts would make a good factor in cost calculations.)

At the moment, the service descriptions must be provided by the party who sets up the federated server. They are also responsible for determining the selectivities. (I think that service endpoints should be able to provide a description of their own capabilities. The service knows its own data and is in a good position to describe what it can and cannot answer with good performance. The service is also able to calculate selectivities on its own.)

The code is not public at the moment. Bastian says he intends to publish it when it’s more polished. (Update: He says he will publish it soon.) (Update: Here it is.)

This entry was posted in General, Semantic Web. Bookmark the permalink.