cygri’s notes on web data

Postal Experiments

Posted on January 22, 2005 by Richard Cyganiak

Einige Spaßvögel testen die Grenzen der US-Post, indem sie allerlei skurrile Dinge verschicken, beispielsweise heliumgefüllte Luftballons, ausreichend frankierte Ziegelsteine, 20$-Noten und alte Rehkeulen:

Deer tibia: Our mailing specialist received many strange looks from both postal clerks and members of the public in line when he picked it up at the station, 9 days. The clerk put on rubber gloves before handling the bone, inquired if our researcher were a “cultist,” and commented that mail must be wrapped.

(Via Boingboing)

Posted in German/Deutsch | Comments Off

Wie man Windows in Linux verwandelt

Posted on January 17, 2005 by Richard Cyganiak

Grrrr. Nicht lustig.

Posted in German/Deutsch | Comments Off

Namespaces and vocabulary partitioning

Posted on January 13, 2005 by Richard Cyganiak

Continuing a thread about views in triple stores.

Leigh Dodds pointed out the need for something like SQL’s views in RDF stores, and suggested vocabulary namespaces as a partitioning mechanism:

The […] subset may be created by filtering out the classes and properties extracted from the database based on their namespaces. For example I might have a triple store containing a mixture of public/private data, with the latter in a separate namespace and I want to pull out just the public aspects for returning from a web service.

I replied, somewhat off-hand:

I think this is not such a good idea. Namespaces are just that, namespaces. Don’t overload them with access control.

Leigh disagrees:

I’m not sure how he’s using the term “access control” here, that’s not what I was suggesting. And namespaces are intended to partition vocabularies, so not using them as a way to ignore data that’s not of interest seems bizarre to me.

First, Leigh, I do agree with you about the worth of views, especially to protect applications from schema changes.

Yes, namespaces are for partitioning vocabularies. But along what kind of boundaries? I think that the best boundaries are semantic. Classes and properties about persons go into one vocabulary. Classes and properties about computer software into another.

Your suggestion is to use your application’s publishing policy as a boundary. Stuff intended for public use goes into one vocabulary, the private backend data into another.

As long as this distinction somewhat coincides with a semantic boundary, I can see no fault with this approach. But otherwise, there are a number of downsides:

Changing your publishing policy gets harder. If you decide to make some formerly private data public, you have to move properties between vocabularies, a rather expensive move.
Having more than two views, with some overlap, becomes difficult.
You reduce the reusability of your vocabulary. Other parties might want to re-use your terms, but not your publishing policy.

In SQL, views can be created, changed and removed without affecting the underlying database schema and data. That’s their whole point. With the namespace-as-view approach, changing a view means changing the schema (vocab term URIs) and the data (triples using the term).

Of course, I don’t know anything about the use case that prompted Leigh’s original post, and everything I say might or might not apply to the specific circumstances. I’m just voicing general design opinion here. So as long as it gets the job done …

Posted in General, Semantic Web | Comments Off

Donald Rumsfeld

Posted on January 9, 2005 by Richard Cyganiak

Man mag von US-Verteidigungsminister Donald Rumsfeld halten was man will – seine Liste mit Weisheiten zu Regierung und Geschäftswelt ist gut.

Ein paar Highlights:

Learn to say “I don’t know.” If used when appropriate, it will be often.

If you are not criticized, you may not be doing much.

Don’t be a bottleneck. […] Force responsibility down and out. Find problem areas, add structure, and delegate. The pressure is to do the reverse. Resist it.

Include others. As former Sen. Pat Moynihan (D., N.Y.) said, “Stubborn opposition to proposals often has no other basis than the complaining question, ‘Why wasn’t I consulted?'”

That which you require be reported on to you will improve, if you are selective. How you fashion your reporting system announces your priorities and sets the institution’s priorities.

“First law of holes: If you get in one, stop digging.”–Anonymous

Perspective–Maurice Chevalier’s response when asked how it felt to reach 80: “Pretty good, considering the alternative.”

“If a problem cannot be solved, enlarge it.”–Dwight D. Eisenhower

[Regel Nr. 158] If you develop rules, never have more than 10.

(Via 43 Folders)

Posted in German/Deutsch | 1 Comment

Folksonomies succeed where the Semantic Web fails

Posted on January 8, 2005 by Richard Cyganiak

The Folksonomy meme has been bouncing around the blogosphere for a couple of weeks. The idea is to categorize information (such as bookmarks or photos) by user-defined keywords, often called tags. This is especially powerful when combined with a social/collaborative component, such as in the popular web apps del.ici.ous and Flickr.

The old-school alternatives are the taxonomy and the controlled vocabulary. Here, an analyst will examine the information space and come up with good categories. Changing the categories later is not possible or expensive. Examples are the Yahoo! directory and anything ever done in the library sciences.

Folksonomies are messy, emergent, bottom-up, cheap and conceptually simple. Taxonomies are clean, well-designed, top-down, expensive and conceptually complex.

I hazard a guess that most of the Semantic Web crowd is, like me, firmly in the ”˜well-designed metadata’ camp. Our vocabularies and ontologies are designed by experts, then handed down to the users. We are not at ease with the idea of users creating their own categorization schemes. If we’ve learned anything from experience, then that the average user is unable to get a subclass realtionship right. A bunch of sloppily assigned tags will not be useful for inferencing.

Clay shirky has something to say (via BoingBoing):

This is something the ‘well-designed metadata’ crowd has never understood — just because it’s better to have well-designed metadata along one axis does not mean that it is better along all axes, and the axis of cost, in particular, will trump any other advantage as it grows larger. And the cost of tagging large systems rigorously is crippling, so fantasies of using controlled metadata in environments like Flickr are really fantasies of users suddenly deciding to become disciples of information architecture. …

Any comparison of the advantages of folksonomies vs. other, more rigorous forms of categorization that doesn’t consider the cost to create, maintain, use and enforce the added rigor will miss the actual factors affecting the spread of folksonomies. Where the internet is concerned, betting against ease of use, conceptual simplicity, and maximal user participation, has always been a bad idea.

He’s right.

Betting on the Semantic Web is betting against ease of use, conceptual simplicity, and maximal user participation. And I don’t see how ontologies and the RDF data model stand even the slightest chance in this particular area.

This is something we’ll have to find an answer to.

Posted in General, Semantic Web | 2 Comments

Views in triple stores

Posted on January 8, 2005 by Richard Cyganiak

Leigh Dodds wants views in triple stores:

Views are an important feature of relational databases, providing a way to abstract over complex queries, subset data to just the minium required for a given task, as well as providing a point around which a schema can be refactored without having to (immediately) change the applications that use it.

Leigh proposes two approaches:

The first is to apply a “window” on the graph, and only extract the data that’s within a certain distance of my origin.

This is implemented easily, but doesn’t work well. If you need a query depth of 3 or 4 (which is likely), you will pull in lots of stuff that you don’t need.

The second subset may be created by filtering out the classes and properties extracted from the database based on their namespaces. For example I might have a triple store containing a mixture of public/private data, with the latter in a separate namespace and I want to pull out just the public aspects for returning from a web service.

I think this is not such a good idea. Namespaces are just that, namespaces. Don’t overload them with access control. This scheme also doesn’t work if you want to filter by data (e.g. by publication date) instead of by schema.

I think that SPARQL’s proposed CONSTRUCT feature could do a nice job here. Basically, it’s like an RDQL query. But instead of returning a â€œresult tableâ€, the results will be stuffed into an additional set of graph patterns to form a new RDF graph. This constructed graph can have a structure very different from the original data, and can use different vocabulary. Very simple example:

CONSTRUCT   ( ?x foaf:name ?name )
WHERE       ( ?x vcard:FN ?name )

For the refactoring scenario, you would change the query pattern, but leave the construct pattern unchanged.

CONSTRUCT, SOURCE and optional patterns are the SPARQL features I’m looking forward to the most.

Posted in General, Semantic Web | 3 Comments

Gecyberschaft, Geblogschaft

Posted on January 7, 2005 by Richard Cyganiak

Susan Crawford bloggt einen Vortrag von Mary Ann Allison. Sie hat eine Theorie zur Evolution des menschlichen Zusammenlebens.

Here’s her idea: sociologist Ferdinand Tonnies described village society before the Industrial Revolution (gemeinschaft) and urban society afterwards (gesellschaft) … and she thinks we’re at a big punctuation point prompted by the information revolution. The new society is gecyberschaft.

Gecyberschaft? Aua. Die zunehmende Denglisierung meiner geliebten Muttersprache sehe ich recht relaxed, aber eine solche Verwurstung deutscher Leihworte in der umgekehrten Richtung tut dann doch weh.

Die Idee hinter dem Wort erscheint mir als halbwegs Sinn machend. Status in der Gemeinschaft erhält man durch Geburt/Abstammung. Status in der Gesellschaft erarbeitet man sich. Status in der Ge…uh…cyberschaft erhält man durch externe Wertschätzung.

Dementsprechend möchte ich dann dafür doch das Wort Gewertschaft vorschlagen. Oder Geschätzschaft?

Geblogschaft ist natürlich auch naheliegend, und sei zumindest der deutschsprachigen Blogosphere als Ersatz für ebendieses Wort ans Herz gelegt.

(via Jeff Jarvis)

Update: Martin Stabe wirft einen ernsthafteren Blick auf das Thema. In den Kommentaren auf Jeff’s Eintrag finden sich weitere mehr oder weniger ernstgemeinte Vorschläge (Gebyteschaft, Gememeschaft, Gegoogleschaft, Beachtungsgemeinschaft). Mary Ann Alison selbst meldet sich dort auch zu Wort:

As for the word gecyberschaft… well, it engages people (including on this blog and at the conference) and often makes them smile–two good things from my POV. I checked with a couple of German sociologists (not a statistically valid sample, of course) who found it interesting and amusing before I started using it.

Posted in German/Deutsch | 1 Comment

Deprecate RDF/XML!

Posted on January 7, 2005 by Richard Cyganiak

Phil Dawes suggests to deprecate RDF/XML. His plan for action:

Deprecate RDF/XML as the default serialisation of RDF. Make it clear that it is tricky to write by hand (i.e. by putting this note in the W3C literature), and that if people want human-oriented xml interchange, they should use xml.

Develop tools to make it easy to specify a mapping between an xml dialect and RDF triples. For important web xml protocols (atom, rss2) specify some default xml to rdf triple mappings.

Promote turtle/n3 as the default human-oriented syntax

Reposition RDF as an information integration and knowledge management technology. It really excels at this, more so than any competing technologies (IMHO).

Promote one of the other triple-based xml serialisations for embedding rdf directly in XML documents. (e.g. trix or rxr)

I couldn’t agree more. RDF/XML’s goal was to be just like XML with a few rdf:this and rdf:that sprinkled in. In retrospect, it’s obvious that this idea was misguided. RDF/XML is bloated, causes unnecessary confustion and clouds the real point of RDF (the data model, not the serialization).

RDF won’t catch on before RDF/XML is either completely hidden behind widespread APIs, or there’s a practical alternative to RDF/XML in widespread use. Such as TriX, which is my favourite alternative, if only by association with its developers.

Posted in General, Semantic Web | 2 Comments

Gotta Get My Stuff Done

Posted on January 2, 2005 by Richard Cyganiak

Eine Quicktime-Animation (via 43 Folders). Ich sage euch, da wurde mein Leben verfilmt.

Posted in German/Deutsch | Comments Off

Project Management Checklists

Posted on December 5, 2004 by Richard Cyganiak

Für meine Bookmarks: Alan Green hat eine Reihe von Project Management Checklists für Software-Projekte. Projektanfang, Projektende, wöchentlich, neues Teammitglied und so weiter. Was drin steht, ist common sense, aber nützlich damit unter Stress nicht ein wichtiger Aspekt unter den Tisch fällt.

(via 43 Folders)

Posted in German/Deutsch | Comments Off

Postal Experiments

Wie man Windows in Linux verwandelt

Namespaces and vocabulary partitioning

Donald Rumsfeld

Folksonomies succeed where the Semantic Web fails

Views in triple stores

Gecyberschaft, Geblogschaft

Deprecate RDF/XML!

Gotta Get My Stuff Done

Project Management Checklists

About me

Links

Recent Posts

Archives