cygri’s notes on web data

Javascript vector graphics

Posted on November 19, 2005 by Richard Cyganiak

Go here and click the monkey. Only works in a recent Firefox build or Safari.

Everything is drawn and animated with a few lines of Javascript. (No Flash. No SVG.) I had no idea that this is even possible.

One more nail in the coffin of the desktop application.

Posted in General | 1 Comment

I want one of these

Posted on October 31, 2005 by Richard Cyganiak

This Wired article reads like something straight from a William Gibson novel: Michael Chorost lost his ability to hear in 2001. A cochlear implant, a small computer installed in his skull, allows him to understand human voices again, but is not sophisticated enough to decode complex sounds. Now, Michael is on a quest for an upgrade to the processor’s software that lets him hear his favourite piece of music again.

The implant […] was a computer. Which meant that, at least in theory, its effectiveness was limited only by the ingenuity of software engineers. As researchers learn more about how the ear works, they continually revise cochlear implant software. Users await new releases with all the anticipation of Apple zealots lining up for the latest Mac OS.

This is a true story. Like Gibson said: The future is already here. It’s just not evenly distributed yet.

Can’t wait for being able to tweak my own brain implants. “How come you’ve never upgraded that crappy OEM version of your neural software? Real men compile their own kernel. Or at least buy something that Just Works!”

(via Mind Hacks)

Posted in General | Comments Off

The Warthog Jump

Posted on October 22, 2005 by Richard Cyganiak

Funny Halo in-game video: The Warthog Jump. What happens when you blow up a car sitting on top of a pile of grenades? Great choice of music, too.

(via BoingBoing)

Posted in General | 37 Comments

Sparklines for the SourceForge stats feeds

Posted on October 22, 2005 by Richard Cyganiak

Sparklines are small word-sized graphics that communicate lots of numbers in a small space. Here’s an example:

Jena downloads are usually around 80 per day during the week, with slightly less on the weekends. After the 2.3 release on 12 Oct, they briefly rose to about 200 for a few days, then stabilized again at 110.

I’ve added some sparklines to the SourceForge project activity RSS feeds. I did this using James Byers’ sparkline PHP library, a phantastic little project which was a joy to use after I got it working.

This is what the daily stats look like now (Safari screenshot; the statistics are from the RSSOwl project):

Statistics RSS feed with sparklines

I’m thinking about adding sparklines to StatCVS too. Ideas are welcome.

Posted in General | Tagged StatCVS | Comments Off

What’s next? A great discussion with some of my favourite thinkers

Posted on October 19, 2005 by Richard Cyganiak

Time Magazine has a great discussion with some of my favourite writers and thinkers about trends for the future. Some choice quotes:

Malcolm Gladwell:

One of the most striking things in observing the evolution of American society is the rise of travel. If I had to name a single thing that has transformed our life, I would say the rise of JetBlue and Southwest Airlines. They have allowed us all to construct new geographical identities for ourselves. Many working people today travel who never could have in the past, for meetings and conferences and all kinds of things, and this is creating another identity for them.

Esther Dyson:

The Internet is like alcohol in some sense. It accentuates what you would do anyway. If you want to be a loner, you can be more alone. If you want to connect, it makes it easier to connect.

Tim O’Reilly:

The generation now growing up is going to expect access to information in a way us fuddy-duddies don’t take for granted. Some say the Net will lead to a radical democratization–power to the people–but I don’t think so. When you harness collective intelligence and the power of blogging, it doesn’t mean power to individuals. It means power to the people best able to aggregate those individuals.

David Brooks:

As the information age matures, you’re getting social stratification based on education. […] People at the top of the income scale pass down the skills one needs to thrive in this economy to their kids who get into Harvard–where the median student comes from a family making $150,000 a year–and they go on to an affluent suburb. And they pass it down, so you get really good public high schools, and people there are more likely to marry people like themselves.

Moby:

[C]ultural production always goes hand in hand with technological development. Like with the records I make. I wouldn’t have been able to make them 20 years ago. It would have cost half a million dollars to make a record instead of $20,000. Now it’s just me with a laptop.

Malcolm Gladwell again:

We will have more debates and disputes, like the one over creationism. When you’re having 100 arguments at once, no one of them matters the way it used to. It’s important not to use a 19th century moral lens to evaluate the kind of debates we’re going to have in the 21st century. We have to accept that the general noise level will increase, but that doesn’t matter. You can be a creationist at night and go to work in the morning as a pediatrician and save lives.

(via BoingBoing)

Posted in General | Comments Off

SPARQL wishlist

Posted on October 18, 2005 by Richard Cyganiak

Some stuff that isn’t in SPARQL, but should be. I’ve two use cases in mind: building AJAX applications on top of a SPARQL store, and enabling discovery of the stuff that is in a SPARQL store you don’t know yet.

Expressions in SELECT: I want to be able to say SELECT DISTINCT LANG(?object) or SELECT DISTINCT DATATYPE(?object). This would also make extension functions much more attractive. (Update: Ideally, expressions sould also be allowed in CONSTRUCT blocks. This has many interesting applications if used with extension functions.)

More functions: NAMESPACE returns the namespace part of the URI. LOCALNAME returns the local part of the URI. CONCAT concatenates literals. SUBSTR returns part of a literal. MATCH extracts bits of a literal according to a regular expression. IF would select a return value based on a boolean expression. I could go on. These can (and probably will) be done as extension functions. As I said, to make extension functions truly useful, they’d have to be allowed in the SELECT clause.

CONSTRUCT *: The motivation here is extracting subgraphs from an RDF graph. I think this is hugely important, and I don’t understand why this is not allowed. I heard something about issues with blank nodes coming from multiple named graphs, but I’m sure this could be worked around.

COUNT: I understand there are semantic issues with this, but it would be super handy in some situations. Currently, if you want to know the number of triples stored in a datastore, you have to fetch them all. Does this suck or what?

Anyway, SPARQL is great and I’m really looking forward to seeing it supported in a wide variety of RDF tools.

Posted in General, Semantic Web | Comments Off

SemWeb student job in Berlin

Posted on October 18, 2005 by Richard Cyganiak

Student job offer (in German)

The job is in the Semantic Web for Pathology project. I’ve previously written about it (in German).

Posted in General, Semantic Web | 1 Comment

Can we measure performance by analyzing CVS repositories?

Posted on October 17, 2005 by Richard Cyganiak

A couple of days ago, I came across this fine paper by Keir Mierle et al. from U of Toronto:

Mining Student CVS Repositories for Performance Indicators

They examined the CVS repositories that their students used for assignments, extracted all sorts of numbers by analysing access patterns and code metrics, and correlated the numbers with the grades the students got on the assignments.

Basically, they didn’t find any good indicators. The best indicator they found is raw lines of code. Unsurprisingly, hard-working students got better grades. The correlation isn’t very strong though.

The second-best indicator: Did the students put spaces after commas in their sourcecode? Students who write foo(a, b, c) tend to get better grades than students who write foo(a,b,c).

The authors conclude:

Although version control repositories contain a wealth of detailed information both in the transaction histories and in the actual files modified by the users, we were unable to find any measurements in the hundreds we examined which accurately predicted student performance as measured by final course grades; certainly

no predictor stronger than simple lines-of-code-written was found.
>

These results directly challenge the conventional wisdom that a repository contains easily extractable predictive information about external performance measures. In fact, our results suggest that aspects such as student work habits, and even code quality, have little bearing on the student’s performance.

This doesn’t bode well for tools like StatCVS. We’re certainly generating interesting information, but this paper provides some evidence that we can’t measure performance.

(via Joel Spolsky)

Posted in General | Tagged StatCVS | Comments Off

Uncovering a plan to destroy the world

Posted on October 17, 2005 by Richard Cyganiak

Here’s what I dreamed last night. (True!)

This is a scene from a James Bond movie. One of the old ones, starring, maybe, Roger Moore as 007 and Gert Fröbe as the Goldfinger-like villain. This is towards the middle of the movie. Mr Villain has invited our charismatic top spy to his luxury boat and just welcomes him on board. The two well-mannered cosmopolitan gentlemen exchange pleasentries. We know, of course, that Mr Bond has a hidden agenda: He wants to find out how Mr V. plans to destroy the world.

Roger Moore as 007

Gert Fröbe as Goldfinger

Mr V., however, is preoccupied: His henchmen, a disorganized group of misfits and dimwits (but well-dressed), don’t seem quite up to the job of running business aboard the ship. There’s supposed to be dinner, but no one knows where and if it’s ready. Mr V. is annoyed.

Here, my point of view changes from an uninvolved observer to that of one of the henchmen. While the boss and his English guest continue to plod along the script, I’m hurrying through the belly of the ship, trying to find out who’s in charge and what I’m supposed to do. No one seems to know though. Just like me, everybody has just been recruited. It’s quite a diverse bunch — I meet some English, Spaniards, Italians, French, Germans, and a handful of eastern Europeans.

Word has it that there will be a team meeting in the ship’s spacious bar area. Slowly people start to arrive. All are twenty- or thirtysomethings, some women. I actually know a few folks, they are fellow Semantic Web hackers I’ve met over the years. As we and the new people introduce each other, which is not easy as not everybody speaks English, it turn out that all of us are youngish researchers, developers and students who have bet their carreers on the success of the Semantic Web.

Unfortunately, before I or Mr Bond can figure out why a supervillain would hire a ragtag band of ex RDF hackers, the story drifts away on a tangent about newspaper Berliner Zeitung defending the freedom of the press by reporting on an effort of the German Secret Service to suppress publishing of unflattering photographs of the new German chancellor Angela Merkel.

Weird.

Posted in General, Semantic Web | Comments Off

TriG syntax issues

Posted on October 16, 2005 by Richard Cyganiak

TriG is a Turtle-based serialization syntax for RDF datasets. The only implementation I know of is in NG4J. An example:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

{
    <http://example.org/bob.rdf> dc:publisher "Bob" . 
    <http://example.org/alice.rdf> dc:publisher "Alice" .
}

<http://example.org/bob.rdf> {
    [] foaf:name "Bob"; foaf:mbox <mailto:bob@example.org> .
}
 
<http://example.org/alice> { 
    [] foaf:name "Alice"; foaf:mbox <mailto:alice@example.org> .
}

This is an RDF dataset. The default graph contains metadata about the two named graphs. TriG is quite useful to write down small examples like this.

The point of this post is not to promote this nice syntax, but to act as a reminder about some minor syntactical issues with the current TriG spec. I’ve culled them from several email threads. This post summarizes my position, and I should probably prod Chris (who maintains the spec) to do something about it at some point in the future.

Namespace declarations

TriG currently allows namespace declarations (the @prefix lines) before, after and between graphs, but not within. I argue that namespaces should be allowed within graphs. This would make any valid Turtle file a valid TriG graph and vice versa, which is a Good Thing for several reasons (simpler spec, easier implementation, easier authoring).

Graph naming

TriG originally didn’t allow unnamed graphs, but after the RDF dataset was adopted for SPARQL, TriG was changed to allow a default graph.

Chris has gone on record saying that these syntactic variations should be allowed for named graphs:

<uri> { ...triples... }
<uri> :- { ...triples... }

And this is for the default graph:

{ ...triples...}

Currently, the spec also allows this:

:- { ...triples... }

This is a bug.

N3 compatibility

There was some push from Jeremy Carroll to maintain compatibility with Notation 3‘s syntax for formulae, which also uses curly brackets to denote graph-like structures, so we discussed adding some more options for named graphs:

<uri> { ...triples... } .
<uri> :- { ...triples... } .
<uri> = { ...triples... }
<uri> = { ...triples... } .

and for the default graph:

{ ...triples... } .

But of all these, only two are actually syntactically valid N3:

<uri> :- { ...triples... } .
<uri> = { ...triples... } .

So, if we want to align the spec more closely to N3, then we should allow only these two, and maybe this one

<uri> = { ...triples... }

because of symmetry.

(N3’s non-RDF (non-Turtle) features like formulae are quite obscure, and I’m not really convinced that compatibility with them is worth the additional complexity.)

Posted in General, Semantic Web | Comments Off

Javascript vector graphics

I want one of these

The Warthog Jump

Sparklines for the SourceForge stats feeds

What’s next? A great discussion with some of my favourite thinkers

SPARQL wishlist

SemWeb student job in Berlin

Can we measure performance by analyzing CVS repositories?

Uncovering a plan to destroy the world

TriG syntax issues

About me

Links

Recent Posts

Archives