Fruitful RDF vocabularies are like cherries not bananas

The SEMIC style guide for semantic engineers is a very interesting document this is currently in review. It is part of a set of actions with the aim:

to promote semantic interoperability amongst the EU Member States, with the objective of fostering the use of standards by, for example, offering guidelines and expert advice on semantic interoperability for public administrations.

I am interested in the re-usability of RDF vocabularies: what is it that makes it easy or hard to take terms from an existing vocabulary and use them when creating a schema for describing something. That seems to me to be important for interoperability in a world where we know that there is going to be more than one standard for any domain and where we also know that no domain is entirely isolated from its neighbours which will have their own native standards. We need data at an atomic level that can persist through expressions conformant with different standards, and that is easiest if the different standards share terms where possible.

This idea of reusing terms from vocabularies is core to the idea of application profiles, and to the way that we conceive Dublin Core and LRMI terms being used, and indeed the way they are being used in the IEEE P2881 Learning Metadata work. One well known example of an application profile that reuses terms from Dublin Core is the Data Cataloguing Vocabulary DCAT which uses terms from Dublin Core, ODRL, FOAF, prov, and few terms created specifically for DCAT to describe entities and relationships according to its own conceptual domain model.

The SEMIC guidelines have lots of interesting things to say about reuse, including rules on what you can and cannot do when taking a terms from existing vocabularies in various ways including reuse “as-is”, reuse with “terminological adaptations” and reuse with “semantic adaptations”. I read these with special interest as I have written similar guidelines for Credential Engine’s Borrowed Terms policy. I am pleased to say we came to the same conclusion (phew). In discussion with the SEMIC authors that conclusion was described as “don’t cherry-pick”, that is: when you borrow or reuse a term from a vocabulary you must comply with everything that the–let’s use the O word here–ontology in which it was defined says about it. That’s not just the textual definition of the term but all that is entailed by statements about domain, range, relationships with other properties and so on. If the original ontology defines, directly or indirectly, “familyName” as being the name of a real person, then don’t use it for fictional characters.


“The problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.”

Joe Armstrong, Coders at Work

I want to cherry-pick. I am looking for cherries, not bananas! I am looking for “lightweight” ontologies, so if you want your terms to be widely reused please create them as such (and this is also in the SEMIC guidelines). Define terms in such a way that they are free from too much baggage: you can always add that later, if you need it, but others can be left to add their own. One example of this is the way in which many Dublin Core Terms are defined without a rdfs:domain declaration. That is what allowed DCAT and others to use them with any class that fits their own domain model. Where ranges are declared for Dublin Core Terms they are deliberately broadly defined. The approach taken by in some ways goes further by not using rdfs:domain and rdfs:range but instead defining domainIncludes and rangeIncludes where the value is “a class that constitutes (one of) the expected type(s)” (my emphasis): by being non-exclusive domainIncludes and rangeIncludes provide a hint at what is expected without saying that you cannot use anything else.

If you’re interested in this line of thinking, then read the special section of Volume 41, Issue 4 of the Bulletin of the Association for Information Science and Technology on Linked Data and the Charm of Weak Semantics edited by Tom Baker and Stuart Sutton (2015).