I have been working with SHACL for a few months in connexion with validating RDF instance data against the requirements of application profiles. There’s a great validation tool created as part of the JoinUp Interoperability Test Bed that lets you upload your SHACL rules and a data instance and tests the latter against the former. But be aware: some errors can lead to the instance data successfully passing the tests; this isn’t an error with the tool, just a case of blind logic: the program doing what you tell it to regardless of whether that’s what you want it to do.
The rules
Here’s a really minimal set of SHACL rules for describing a book:
@base <http://example.org/shapes#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns> . @prefix sdo: <https://schema.org/> . @prefix sh: <http://www.w3.org/ns/shacl#> . <Book_Shape> a sh:NodeShape ; sh:property <bookTitle_Shape> ; sh:targetClass sdo:Book . <bookTitle_Shape> a sh:PropertyShape ; sh:path dct:title ; sh:datatype rdf:langString ; sh:nodeKind sh:Literal ; sh:minCount 1 ; sh:severity sh:Violation .
Essentially it says that the description of anything typed as a schema.org:Book should have a title provided as a langString using the Dublin Core title property. Here’s some instance data
@prefix : <http://example.org/books#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix sdo: <https://schema.org/> . :42 a sdo:Book; dct:title "Life the Universe and Everything."@en .
Throw those into the SHACL validator and you get:
Result: SUCCESS
Errors: 0
Warnings: 0
Messages: 0
Which (I think) is what you should get.
So wrong it’s right
But what about this instance data:
@prefix : <http://example.org/books#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix sdo: <http://schema.org/> . :42 a sdo:Book; dct:title "Life the Universe and Everything." .
For this the validator also returns
Result: SUCCESS
Errors: 0
Warnings: 0
Messages: 0
Did you spot the difference? Well, for one thing the title has no language attibute, it’s not a langString.
How about:
@prefix : <http://example.org/books#> . @prefix dct: <http://purl.org/dc/terms/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix sdo: <http://schema.org/> . :42 a sdo:Book .
which has no title at all, but still you’ll get the “success, no errors” result. That cannot be right, can it?
Well, yes, it is right. You see there is another error in those last two examples. In the SHACL and the first example, the URI for Schema.org is given as
@prefix sdo: <https://schema.org/> .
In the second and third examples it is
@prefix sdo: <http://schema.org/> .
That’s the sort of subtle and common mistake I would like to pick up, but in fact it stops the validator from picking up any mistakes. That’s because the SHACL rules apply to anything typed as a https://schema.org/Book and in the second and third examples (where the prefix is http, not https) there isn’t anything typed as such. No rules apply, no rules are broken: zero errors — success!
What to do?
I’m not really sure. I see this type of problem quite a lot (especially if you generalize “this type of problem” to mean the test doesn’t have quite the logic I thought it did). I suppose lesson one was always to test shapes with invalid data to make sure they work as expected. That’s a lot of tests.
Arising from that is to write SHACL rules to check for everything: the errors above would be picked up if I had checked to make sure there is always one entity with the expected type for Books: there’s recipe for this on the SHACL wiki.
Generalizing on the idea that simple typos can mean data not being tested because the term identifier doesn’t match any term in the schema your using, it’s worth checking that all term identifiers in the instance data and the SHACL are actually in the schema. This will pickup when sdo:Organisation is used instead of sdo:Organization. SHACL won’t do this, but it’s easy enough to write a python script that does.
Holger Knublauch made a very good point about this post on twitter:
Well, that is a generic problem it came to me when integrating data models; the specific problem is that http://schema.org properties are no declared the same as those under https:// despite they are informally declared equivalent (https://schema.org/docs/faq.html#19) and the main issue (https://github.com/schemaorg/schemaorg/issues/2516) is still open, it looks like they want to release the equivalent relations and in any case they released all the examples in https (https://github.com/schemaorg/schemaorg/issues/2597) -> so we should use https
Hi Emidio, leaving aside the general question of whether schema.org should declare equivalences between http and https versions (which may help with several things), I don’t think it would help here. I don’t think the SHACL validator looks at the RDFSchema when doing its stuff. It doesn’t blink if you have a typo in a term name that means you’re using a term that doesn’t exist — I could have used the example of the class being
https://schema.org/boo
above (there’s no end to the mistakes I’ve made).indeed it doesn’t look at the rdfschema but just at the namespace which have to be consistent in the shapes and in the instances