Tag Archives: validation

SHACL, when two wrongs make a right

I have been working with SHACL for a few months in connexion with validating RDF instance data against the requirements of application profiles. There’s a great validation tool created as part of the JoinUp Interoperability Test Bed that lets you upload your SHACL rules and a data instance and tests the latter against the former. But be aware: some errors can lead to the instance data successfully passing the tests; this isn’t an error with the tool, just a case of blind logic: the program doing what you tell it to regardless of whether that’s what you want it to do.

The rules

Here’s a really minimal set of SHACL rules for describing a book:

@base <http://example.org/shapes#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns> .
@prefix sdo: <https://schema.org/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .

<Book_Shape> a sh:NodeShape ;
    sh:property
        <bookTitle_Shape> ;
    sh:targetClass sdo:Book .

<bookTitle_Shape> a sh:PropertyShape ;
    sh:path dct:title ;
    sh:datatype rdf:langString ;
    sh:nodeKind sh:Literal ;
    sh:minCount 1 ;
    sh:severity sh:Violation .

Essentially it says that the description of anything typed as a schema.org:Book should have a title provided as a langString using the Dublin Core title property. Here’s some instance data

@prefix : <http://example.org/books#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sdo: <https://schema.org/> .

:42 a sdo:Book;
  dct:title "Life the Universe and Everything."@en .

Throw those into the SHACL validator and you get:

Result: SUCCESS
Errors: 0
Warnings: 0
Messages: 0

Which (I think) is what you should get.

So wrong it’s right

But what about this instance data:

@prefix : <http://example.org/books#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sdo: <http://schema.org/> .

:42 a sdo:Book;
  dct:title "Life the Universe and Everything." .

For this the validator also returns

Result: SUCCESS
Errors: 0
Warnings: 0
Messages: 0

Did you spot the difference? Well, for one thing the title has no language attibute, it’s not a langString.

How about:

@prefix : <http://example.org/books#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sdo: <http://schema.org/> .

:42 a sdo:Book .

which has no title at all, but still you’ll get the “success, no errors” result. That cannot be right, can it?

Well, yes, it is right. You see there is another error in those last two examples. In the SHACL and the first example, the URI for Schema.org is given as

@prefix sdo: <https://schema.org/> .

In the second and third examples it is

@prefix sdo: <http://schema.org/> .

That’s the sort of subtle and common mistake I would like to pick up, but in fact it stops the validator from picking up any mistakes. That’s because the SHACL rules apply to anything typed as a https://schema.org/Book and in the second and third examples (where the prefix is http, not https) there isn’t anything typed as such. No rules apply, no rules are broken: zero errors — success!

What to do?

I’m not really sure. I see this type of problem quite a lot (especially if you generalize “this type of problem” to mean the test doesn’t have quite the logic I thought it did). I suppose lesson one was always to test shapes with invalid data to make sure they work as expected. That’s a lot of tests.

Arising from that is to write SHACL rules to check for everything: the errors above would be picked up if I had checked to make sure there is always one entity with the expected type for Books: there’s recipe for this on the SHACL wiki.

Generalizing on the idea that simple typos can mean data not being tested because the term identifier doesn’t match any term in the schema your using, it’s worth checking that all term identifiers in the instance data and the SHACL are actually in the  schema. This will pickup when sdo:Organisation is used instead of sdo:Organization. SHACL won’t do this, but it’s easy enough to write a python script that does.