I’ve described a couple of short “toy” examples as proof of concept of turning a Dublin Core Application Profile (DC TAP) into SHACL in order to validate instance data: the SHACL Person Example and a Simple Book Example; now it is time to see how the approach fares against a real world example. I chose the EU joinup Data Catalog Application Profile (DCAT AP) because Karen Coyle had an interest in DCAT, it is well documented (pdf) with a github repo that has SHACL files, there is a Interoperability Test Bed validator for it (albeit a version late) and I found a few test instances with known errors (again a little dated). I also found the acronym soup of DCAT AP DC TAP irresistable.
You can see the extended TAP as Google sheets. The tap tab is the DCTAP, the other tabs are information about the Applicaiton profile, namespace prefixes and shapes that are needed by tap2shacl. You can also see the csv export files, test instances and generated SHACL for DCATAPDCTAP in my TAPExamples github repo. The actual TAP is maybe a little ragged around the edges. Partly I got bored, partly I wasn’t sure how far to go: for example, should every reference to a Catalog be a description that conforms to all the DCAT AP rules for a catalog, or is it sufficient to just have an IRI in the instance data? At the other extreme what to do about properties where the only requirement was that an entity of a certain type be referenced — should the SHACL demand the type be explicitly declared or is the intent that an IRI in the instance data is enough and the type may be inferred?
I had to add a little functionality where I wanted a shape to be used against two targets, objects of a property and instances of a class. This actually prompted a fairly significant rewrite of how shape information is handled, and I have thought of further extensions.
It works in that valid SHACL is produced. The SHACL is more verbose than the hand crafted SHACL produced by the DCAT AP project, but I think that is to be expected from a general purpose conversion script. It also works in that when I run the test instances through the itb SHACL validator with my SHACL and through the DCAT AP specific validator they both flag the same errors. My shacl actually raises more error messages, but that is a result of it being more verbose, sometimes it gives two errors (one that the value of a predicate does not match the expecterd shape, the second about how the shape is not matched). The important thing is that the same fixes work on both.
I’m quite happy with this. There may be some requirements in the DCAT AP that I haven’t checked for, but this will do for now. Next I want to work out how best to require a specific skos concept scheme be used.
The only niggle that I have is that the TAP sheet in Google docs isn’t as easily human readable as the corresponding tables in the DCAT AP documentation. There’s a balance we’re working on between keep the necessary precision while weighing technical correctness against human friendliness, and it is hard to strike.