Tag Archives: RDF

Fruitful RDF vocabularies are like cherries not bananas

The SEMIC style guide for semantic engineers is a very interesting document this is currently in review. It is part of a set of actions with the aim:

to promote semantic interoperability amongst the EU Member States, with the objective of fostering the use of standards by, for example, offering guidelines and expert advice on semantic interoperability for public administrations.

I am interested in the re-usability of RDF vocabularies: what is it that makes it easy or hard to take terms from an existing vocabulary and use them when creating a schema for describing something. That seems to me to be important for interoperability in a world where we know that there is going to be more than one standard for any domain and where we also know that no domain is entirely isolated from its neighbours which will have their own native standards. We need data at an atomic level that can persist through expressions conformant with different standards, and that is easiest if the different standards share terms where possible.

This idea of reusing terms from vocabularies is core to the idea of application profiles, and to the way that we conceive Dublin Core and LRMI terms being used, and indeed the way they are being used in the IEEE P2881 Learning Metadata work. One well known example of an application profile that reuses terms from Dublin Core is the Data Cataloguing Vocabulary DCAT which uses terms from Dublin Core, ODRL, FOAF, prov, and few terms created specifically for DCAT to describe entities and relationships according to its own conceptual domain model.

The SEMIC guidelines have lots of interesting things to say about reuse, including rules on what you can and cannot do when taking a terms from existing vocabularies in various ways including reuse “as-is”, reuse with “terminological adaptations” and reuse with “semantic adaptations”. I read these with special interest as I have written similar guidelines for Credential Engine’s Borrowed Terms policy. I am pleased to say we came to the same conclusion (phew). In discussion with the SEMIC authors that conclusion was described as “don’t cherry-pick”, that is: when you borrow or reuse a term from a vocabulary you must comply with everything that the–let’s use the O word here–ontology in which it was defined says about it. That’s not just the textual definition of the term but all that is entailed by statements about domain, range, relationships with other properties and so on. If the original ontology defines, directly or indirectly, “familyName” as being the name of a real person, then don’t use it for fictional characters.

Oof.

“The problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.”

Joe Armstrong, Coders at Work

I want to cherry-pick. I am looking for cherries, not bananas! I am looking for “lightweight” ontologies, so if you want your terms to be widely reused please create them as such (and this is also in the SEMIC guidelines). Define terms in such a way that they are free from too much baggage: you can always add that later, if you need it, but others can be left to add their own. One example of this is the way in which many Dublin Core Terms are defined without a rdfs:domain declaration. That is what allowed DCAT and others to use them with any class that fits their own domain model. Where ranges are declared for Dublin Core Terms they are deliberately broadly defined. The approach taken by schema.org in some ways goes further by not using rdfs:domain and rdfs:range but instead defining domainIncludes and rangeIncludes where the value is “a class that constitutes (one of) the expected type(s)” (my emphasis): by being non-exclusive domainIncludes and rangeIncludes provide a hint at what is expected without saying that you cannot use anything else.

If you’re interested in this line of thinking, then read the special section of Volume 41, Issue 4 of the Bulletin of the Association for Information Science and Technology on Linked Data and the Charm of Weak Semantics edited by Tom Baker and Stuart Sutton (2015).

 

SHACL, when two wrongs make a right

I have been working with SHACL for a few months in connexion with validating RDF instance data against the requirements of application profiles. There’s a great validation tool created as part of the JoinUp Interoperability Test Bed that lets you upload your SHACL rules and a data instance and tests the latter against the former. But be aware: some errors can lead to the instance data successfully passing the tests; this isn’t an error with the tool, just a case of blind logic: the program doing what you tell it to regardless of whether that’s what you want it to do.
Continue reading

When RDF breaks records

In talking to people about modelling metadata I’ve picked up on a distinction mentioned by Staurt Sutton between entity-based modelling, typified by RDF and graphs, and record-based structures typified by XML; however, I don’t think making this distinction alone is sufficient to explain the difference, let alone why it matters.  I don’t want to get into the pros and cons of either approach here, just give a couple of examples of where something that works in a monolithic, hierarchical record falls apart when the properties and relationships for each entity are described separately and those descriptions put into a graph. These are especially relevant when people familiar with XML or JSON start using JSON-LD. One of the great things about JSON-LD is that you can use instance data as if it were JSON, without really paying much regard to the “LD” part; that’s not true when designing specs because design choices that would be fine in a JSON record will not work in a linked data graph. Continue reading