cygri’s notes on web data

Invisible language

Posted on September 27, 2006 by Richard Cyganiak

I have a lot of trouble writing about Ruby, because I find there’s nothing to say. […] Ruby seems so self-explanatory to me. It makes it almost boring; you try to focus on Ruby and you wind up talking about some problem domain instead of the language. I think that’s the goal of all programming languages, but so far Ruby’s one of the few to succeed at it so well.

Posted in General | Comments Off

[bxmlt] Photos and Web 2.0 Infotag

Posted on September 27, 2006 by Richard Cyganiak

Magnus is blogging today’s Web 2.0 info session from here onward (in German).

A bunch of photos: Set 1, Set 2.

Posted in General | Comments Off

[bxmlt] Eric Prud’hommeaux – SAWSDL

Posted on September 27, 2006 by Richard Cyganiak

Eric shows off SAWSDL. I didn’t know about it before. It defines mappings from Web Services to RDF. The basic idea is to start with a WSDL description of the web service. Then add a few attributes that describe how bits of content in the XML messages map to an equivalent RDF description of the message contents. It uses SPARQL-like fragments and XPath. The fun is that the mapping works in both directions – from webservice results to RDF, and from RDF messages to webservice requests.

Posted in General | Comments Off

[bxmlt] Ivan Herman – Questions (and Answers) on the Semantic Web

Posted on September 27, 2006 by Richard Cyganiak

Ivan is here today for the W3C Tag and gives a keynote to the general conference audience (Slides). If you read this through Planet RDF you won’t find much new in this talk, but I think it is very appropriate for the audience of XML and Web folks here.

Is the Semantic Web AI on the Web? No. “Beware the Hype!” The Semantic Web doesn’t intend to be the Future of Human Knowledge. It is infrastructure for building software that acts as “intelligent secretaries”, but only in the sense of reducing drudge work. Database integration might be the most important application of the Semantic Web.

What is the Semantic Web, then? It is the Web of Data.

What is the relationship to AI? Some SW technologies have benefited from AI research, and the SW has brought new concerns and use cases to AI.

What is RDF? It’s about creating relationships among resources on the Web and to interchange them. It’s pretty much like the hyperlinks on the traditional Web, except there is no “current” document, there is no user-interface action for “clicking” a link, and links are typed.

But isn’t RDF simply an (ugly) XML application? No. RDF is really a graph. RDF/XML is just one way to write it down. Think in terms of graphs, the rest is syntactic sugar. Yes, RDF/XML is ugly because it was developed in the “prehistory” of XML, and now there’s too much legacy code to change it. If you prefer, use Turtle.

What about ontologies and rules? They are a “glue” to help with data integration. Ontology and rule processors can automatically find out that different concepts in graphs are actually the same thing.

Is all this surprising? No, we do this kind of data integration all the time on the Web, using our brains. This is just about adding some bits to the Web that are needed to allow machines to do a part of it.

“One has to learn formal logic to understand and use the Semantic Web.” No, it doesn’t have to be very complex. A little glue can take you quite far. There is an “onion” with RDF in the middle, then RDF Schema, OWL Lite, OWL DL, OWL Full. An application may choose the complexity it wants. Compare to SQL, where the formal semantics is very complex, but 95% of SQL users never looked at it. Developing an ontology may require more knowledge, but that’s for a small percentage of users.

Isn’t this research only? Does this have any industrial relevance? There’s a lot of activity in health care, life sciences, digital libraries, defense. There are lots of tools now, so lack of good tools is no longer an excuse. Remember the original Web started at CERN …

Ivan pointed to lots of areas of current development throughout the talk: GRDDL, RDFa, Database-to-RDF mapping (he mentions our work), SPARQL.

Posted in General, Semantic Web | 2 Comments

[bxmlt] Uwe Krüger – NPBibSearch, an ontology of NP-complete problems

Posted on September 26, 2006 by Richard Cyganiak

From the problem of the NP-completeness of ontologies, we now move to an ontology of NP-complete problems.

The famous “P = NP?” is one of the big unsolved questions of computer science, with a large body of existing work. The talk is about formalizing the domain of NP-complete problems.

The ontology has classes like “Decision Problem”, “Complexity Class” and “Algorithm”. The ontology was populated with 350 problems from a standard text book on NP-completeness. Instances are annotated with further bibliographical references from some 1000 papers. Everything was entered by hand. (I think.)

Why do this? To assist bibliographical searches. There’s a web interface. (I think it’s supposed to be here, currently down.) In addition to the usual full text search, users can navigate the domain space. The web interface includes a cool widget that shows the current item’s relation to neighbouring items and looks quite usable – a good domain-specific search interface.

Implementation: Jena, servlets, Google/Yahoo API for full text search.

(This is an example of the “80% of Semantic Web projects” Heiner Stuckenschmidt was referring to earlier today. I think the value in this project is the aggregated knowledge collected by a number of experts over years. This would become really interesting if it would be possible to link this with knowledge bases in neighbouring domains maintained by other groups of experts.)

Posted in General, Semantic Web | 1 Comment

[bxmlt] Nicole Natho – mArachna, an OWL-based mathematical knowledge base

Posted on September 26, 2006 by Richard Cyganiak

In this short talk, Nicole describes a system that extracts knowledge from mathematical texts in order to build an intelligent encyclopedia.

In mathematical texts, there are sections with a fairly regular structure: conditions, conclusions, properties. “Let x be … Y is a Z ifff …” These can be extracted with text analysis.

The goal is an ontology of mathematical texts, not mathematics.

They use the TRALE system, a grammar analyser for the German language. The implementation uses Jena. They used to output topic maps, but changed to OWL.

Q: (Rainer Eckstein) So how much effort is this?

A: A lot. We had to build a lexicon for the TRALE parser first, that took about a year.

Q: Implementation?

A: Convert input texts to TEI, then a lot of Java, some Python scripts. TRALE is Prolog.

Posted in General, Semantic Web | Comments Off

[bxmlt] Heiner Stuckenschmidt – What’s Wrong with the Semantic Web?

Posted on September 26, 2006 by Richard Cyganiak

I’m at Berliner XML Tage, an annual German-language XML and Semantic Web conference in Berlin.

The keynote by Heiner Stuckenschmidt of Universität Mannheim features a provocative title. After a recap of the 2001 SciAm paper and a short journey up and down the Semantic Web technology stack, it gets more interesting later on.

Typical Semantic Web applications of today work something like this: Information is extracted from existing data, like web pages. An ontology is built with Protégé. Some inference is done over everything, and some views generated (“Semantic Portals”). Heiner estimates that this describes 80% of current Semantic Web projects.

But that’s not “living in the Real Web”, Heiner says. We want distributed information, a P2P-like architecture, and have to deal with the possibility of failure, inconsistency, incompleteness and heterogenity of data.

Don’t do big monolithical ontologies. Ontologies should be modular.

Don’t do big monolithical servers. Things should happen distributed, e.g. based on ontology structure, that is, put related information on the same server but not everything on a single one, or leave the data and ontologies “out there” in the Web.

Don’t do big monolithical reasoning. Gathering all the data in one place to do inference is not optimal. Inference engines should sit on every node in the distributed system. (He points to C-OWL, Distributed Description Logics and the DRAGO system as possible approaches to distributed inference.)

Distributed Ontologies: Heiner thinks that OWL works well for the old scenario where ontologies are built by individual persons or organizations, but is not very good for working with distributed ontologies. There are many situations where “A is almost subClassOf B”, but in OWL, a single outliner prevents us from relating the two classes. A possible way out would be probabilistic logic.

So, what’s wrong with the Semantic Web? Too little attention has been paid to the specific needs of a distributed environment. The Semantic Web is a part of the World Wide Web.

Q: You talked about distributed ontologies. What about distributed RDF knowledge bases?

A: They exist, the need for distribution has been realized earlier in this area. There are many systems that do centralized reasoning over distributed data.

Q: Shouldn’t we concentrate more on RDF data and schema than OWL?

A: Maybe – but I’m a logician, not a database guy.

Q: (Klaus Schild) Scalability? OWL is NP-complete.

A: Distribution helps. Certain combinations of operators are deadly, but if the two operators happen to end up on different nodes, things can be much faster. But in general it’s a problem.

Q: But if you take the Web seriously, you need sub-linear complexity.

A: You can’t have that with RDF and OWL.

Posted in General, Semantic Web | 3 Comments

Tim Bray on Ruby syntax

Posted on August 27, 2006 by Richard Cyganiak

Tim Bray has been learning Ruby for a while and is currently summarizing his experiences as the Ruby Ape Diaries. The latest installment – Surface Phenomena – is particularily interesting as it deals with the area where Ruby really shines: its friendly syntax and high readability.

Posted in General | Comments Off

Italy photos

Posted on August 22, 2006 by Richard Cyganiak

I finally finished uploading some photos from my recent trip to Italy. It was a visit to Semedia group in Ancona, and a few days in Rome, all during the hottest week of the year.

This was my first time in Italy. Needless to say, I had a seriously good time.

Posted in General | Comments Off

Cats and productivity

Posted on August 21, 2006 by Richard Cyganiak

Scott Adams:

As soon as I pick up a drawing implement, [my cat] systematically goes around my office chewing and scratching one item after another until she finds something that will make me stop work and pet her. I don’t want her to learn what will bother me most, so I try to trick her. I act all worked up when she eats a yellow sticky note while remaining nonchalant as she goes all mongoose on my monitor cable. We’ve been doing this routine for years, which is why my office looks like a Hezbollah safe house and all of my best ideas are inside my cat.

Posted in General | Comments Off

Invisible language

[bxmlt] Photos and Web 2.0 Infotag

[bxmlt] Eric Prud’hommeaux – SAWSDL

[bxmlt] Ivan Herman – Questions (and Answers) on the Semantic Web

[bxmlt] Uwe Krüger – NPBibSearch, an ontology of NP-complete problems

[bxmlt] Nicole Natho – mArachna, an OWL-based mathematical knowledge base

[bxmlt] Heiner Stuckenschmidt – What’s Wrong with the Semantic Web?

Tim Bray on Ruby syntax

Italy photos

Cats and productivity

About me

Links

Recent Posts

Archives