There’s a lot of discussion recently around HTML5’s microdata proposal, and how it relates to W3C’s earlier RDFa standard that is currently being updated for HTML5. Microdata solves many of the use cases of RDFa in a much simpler way. But some other use cases it cannot solve. This is because microdata assumes a world where there are very few or even just a single vocabulary; mixing vocabularies on a single item is rather difficult. Jeni Tennison has an excellent statement of the problem, along with a proposed solution.
In this post I put forward another proposal for addressing at least part of the problem.
The problem: microdata is limited to a single itemtype
per element.
Why is this a problem? Because it makes mixing vocabularies really hard. If I decide to mark up an address with schema.org’s PostalAddress
, then I can’t easily add markup for microdata’s built-in vCard vocabulary. I’ll have to repeat content in order to use both vocabularies. This design benefits the Google-backed schema.org; more focused special-purpose vocabularies, or open alternatives to schema.org with a transparent development process, will have a difficult stand.
An example. So let’s assume I have this address and want to mark it up with microdata:
<div> <span>26 Dun Aengus</span>, <span>Galway</span>, <span>Ireland</span>. </div>
Then here’s how I would do it with schema.org terms:
<div itemscope itemtype="http://schema.org/PostalAddress"> <span itemprop="streetAddress">26 Dun Aengus</span>, <span itemprop="addressLocality">Galway</span>, <span itemprop="addressCountry">Ireland</span>. </div>
And here with vCard terms:
<div itemscope itemtype="http://microformats.org/profile/hcard"> <span itemprop="street-address">26 Dun Aengus</span>, <span itemprop="locality">Galway</span>, <span itemprop="country-name">Ireland</span>. </div>
It is clear why combining both versions into a single one is difficult. Microdata uses short property names like itemprop="street-address"
. If an element had multiple itemtype
s, then it would be impossible to tell which itemtype
the street-address
property belongs to. Assuming that it belongs to both types would be dangerous; there could be cases where a property exists in both vocabularies but with different meaning. The restriction to a single type prevents such ambiguity.
Multiple itemtype
s without ambiguity: Here’s the proposal. I’ll start by creating an item that has all the properties from both versions—I’m omitting the itemtype
s for now to avoid ambiguity:
<div itemscope> <span itemprop="streetAddress street-address">26 Dun Aengus</span>, <span itemprop="addressLocality locality">Galway</span>, <span itemprop="addressCountry country-name">Ireland</span>. </div>
Without itemtype
, this generates an untyped item with six properties:
- itemtype: none
- property: streetAddress = 26 Dun Aengus
- property: street-address = 26 Dun Aengus
- property: addressLocality = Galway
- property: locality = Galway
- property: addressCountry = Ireland
- property: country-name = Ireland
The altitem
property. Microdata would get a new built-in property, called altitem
. Let’s add an additional element with this property into the untyped item:
<meta itemprop="altitem" content="http://schema.org/PostalAddress streetAddress addressLocality addressCountry">
What’s going on here? The idea is that altitem
takes a whitespace-separate list. When added to an item, it creates a new “alternate item” whose itemtype
is the first element of the list. Then it looks at the rest of the list, which should be property short-names. It copies any of these named properties from the original item to the new item. So, we’d end up with a second item besides the type-less original item. This second item has:
- itemtype: http://schema.org/PostalAddress
- property: streetAddress = 26 Dun Aengus
- property: addressLocality = Galway
- property: addressCountry = Ireland
Which is exactly the same as the original schema.org item from above. Creating the vCard item is just another property:
<meta itemprop="altitem" content="http://microformats.org/profile/hcard street-address locality country-name">
This gives us:
- itemtype: http://microformats.org/profile/hcard
- property: street-address = 26 Dun Aengus
- property: locality = Galway
- property: country-name = Ireland
So now we’d have three items in total: the original untyped item, and the two typed alternate items.
What’s nice about this:
- It doesn’t require any new syntax, just a new property.
- Multiple types generate multiple items, which are visible in the microdata API just like normal items.
- It plays well with
itemref
, so thealtitem
declaration doesn’t have to be repeated if I have several postal addresses on the page. - It’s plays well with a copy-and-paste style of web development. “If you want to use myVocab together with another vocab, just paste this
snippet into your item and add the appropriate
itemprops
…”
Issues. Quite some details would still have to be worked out:
- What happens to properties with full URL names? I guess they should always be copied to all items.
- What happens to
itemid
? I guess all items should receive the sameitemid
from the original item. - In microdata,
itemtype
is inherited by nested sub-items. I’m not sure how this should work ifaltitem
is present. - Properties within a microdata are ordered; there’s a question whether the order in
altitem
or in the original item should take precedence when alternate items are generated. - Would it be worth having a dedicated microdata attribute for this?
- Would microdata clients actually implement this? There is a risk that too many implementers would take shortcuts and just implement the basic case and ignore
altitem
.
Summary: This post shows how multiple itemtype
s could be supported in microdata without introducing new syntax, without making the common case of a single vocabulary more complex for authors, and without fundamentally changing the data model.
Thanks for this interesting proposal. Properties derived from different vocabs are grouped via altitems, neat! The thing is I think it does not solve the exceptional issue you mentioned (if full URIs are not used in itemprops):
“there could be cases where a property exists in both vocabularies but with different meaning.”
Does it?
@Xi: In that case, the author can only use the clashing property from one vocabulary. It is not ambiguous, but the author has to choose.
Hi, Richard,
Point taken. Probably in that unusual but possible scenario, it’d be better if there is a chance for publishers to declare and use local alias for clashing properties in @content (e.g., content=”http://microformats.org/profile/hcard street-address: sa locality country-name”) and alias will be mapped to the full URI when an RDF/microdata parser is applied. May be however too complicated in this way.
There’s a trade-off between power and complexity. Semantic clashes between property names do occur, but they are rare, and I’m not sure it’s worth worrying much about.