Blank nodes considered harmful

Well, they are not always harmful. But most of the time. I’ll get to that in a minute.

On the semantic-web@w3.org list, W3C’s Sandro Hawke has a lucid and concise summary of the problems with blank nodes in RDF. It’s worth quoting in full:

I agree that *software* should not change blank nodes to nodes with a
URI label. But, when practical, *people* probably should, as they are
authoring.

In general, blank nodes are a convenience for the content provider and a
burden on the content consumer. Higher quality data feeds use fewer
blank nodes, or none. Instead, they have a clear concept of identity
and service for every entity in their data.

If someone in the middle tries to convert (Skolemize) blank nodes, it’s
a large burden on them. Specifically, they should provide web service
for those new URIs, and if they get updated data from their sources,
they’re going to have a very hard [perhaps impossible] time
understanding what really changed.

Does this mean blank nodes are evil? Not always. Sometimes they are tolerable, sometimes they are a necessary last resort, and sometimes they are good enough. But they are never good.

They are fine for transient data that’s not meant to be stored.
They can be the only viable option if a changeable upstream data source doesn’t provide identifiers that persist across requests/updates.
They can be tolerable for unimportant auxiliary resources that don’t correspond to a meaningful entity in the domain of interest (e.g., some n-ary relations) and are not worth the hassle of maintaining a stable URI.

In all other cases, blank nodes should be avoided. Sandro is right: publishing RDF with blank nodes puts a burden on the consumer. Especially if the data might change in the future.

The higher the percentage of blank nodes in a dataset, the less useful it is.

About me

Links

Recent Posts

Archives