XCRI-CAP (eXchanging Course Related Information, Course Advertising Profile) is the UK standard for course marketing information in Higher Education. It is compatible with the European Standard Metadata for Learning Opportunities. The W3C schema course extension community group has developed terms for describing educational courses that are now part of schema.org. Here I look at translating the data from an XCRI-CAP xml feed to schema.org json-ld.
The aim here is to illustrate the extent to which the two specifications are interoperable. Also mapping a functioning specification for advertising courses to schema.org terms will give an indication what might be lacking from the latter, or to help define a subset of schema.org that could be used as an application profile for course advertising. Finally, there is a python script that might be the start of a useful tool for people who have XCRI-CAP data and want to use schema.org to describe those courses.
Before going any further I should clear up one potential point of misunderstanding. In the UK ‘course’ is often used to describe a programme of study at University or College level lasting from one to five years, leading to an award such as an Diploma, Degree, Masters etc. These ‘courses’ also called programmes, and roughly translate to what in the US can be called a Course of Study. They typically comprise several modules, also often called courses (sorry, we made up this language as we went along). XCRI-CAP is primarily used to describe these long courses/programs of study, because in the UK that is what institutions typically advertise to potential students. However, XCRI-CAP can also be used to describe short courses. My sense from the development of the schema course extension is that many people had short courses in mind (e.g. MOOCs), however it is also applicable to long courses / programs of study. So, in short, for this discussion, if it is a “sequence of events and/or creative works that aims to build the knowledge, competence or ability of learners” then I’ll call it a course, however long or short it is.
The anatomy of an XCRI-CAP XML feed in schema.org terms
To help show how XCRI-CAP maps to schema.org terms I took a model XCRI XML feed prepared by Alan Paull and gutted it of most content (this is an example of the Post Graduate XCRI format): [edit – I made all the empty XML tags self closed so it is easier to copy and paste this into an XML editor. Thank you Tavis Reddick]
<?xml version="1.0" encoding="UTF-8"? /> <!-- Author: Alan Paull, APS Ltd, email@example.com Created: 25 June 2014; modified: 21 May 2015 This is a generic XCRI-CAP 1.2 example file produced to illustrate the postgraduate format adopted by Prospects, including material that would be expected to be relevant for other aggregators. It uses the coursedataprogramme.xsd schema. It uses revisions to the schemas to include specific refinements for postgraduate data vocabularies. Modified by Phil Barker <http://people.pjjk.net/phil> to show just starting tags and hints to content --> <catalog xmlns="http://xcri.org/profiles/1.2/catalog" xmlns:xcriTerms="http://xcri.org/profiles/1.2/catalog/terms" xmlns:credit="http://purl.org/net/cm" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:mlo="http://purl.org/net/mlo" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos" xmlns:xhtml="http://www.w3.org/1999/xhtml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:courseDataProgramme="http://xcri.co.uk"> <dc:contributor /> <dc:description /> <provider> <!-- list of all the university's departments that can 'own' courses. --> <mlo:hasPart /> <mlo:hasPart /> <dc:description /> <dc:identifier /> <dc:identifier xsi:type="courseDataProgramme:ukprn" /><!-- numerical id --> <dc:title /> <mlo:url /> <course> <!-- isPartOf must match exactly with a hasPart entry --> <mlo:isPartOf /> <!-- Note XHTML markup (concise markup version) --> <dc:description> <xhtml:div /> </dc:description> <!-- 'specialFeature' is in the XCRI-CAP 1.2 Terms schema (verbose markup version) --> <dc:description xsi:type="xcriTerms:specialFeature"> <div xmlns="http://www.w3.org/1999/xhtml" /> </dc:description> <dc:identifier /><!--url--> <dc:identifier xsi:type="courseDataProgramme:internalID" /><!--alpha-numeric id--> <dc:subject xsi:type="courseDataProgramme:JACS3" identifier="N200" /><!--name of subject--> <dc:subject /><!--name of subject--> <dc:title /> <!-- Course type codes specific to PG(T) --> <dc:type xsi:type="courseDataProgramme:courseTypeGeneral" courseDataProgramme:identifier="PG" /><!--label for code--> <dc:type xsi:type="mlo:RTCourseTypeFlag" mlo:RT-identifier="T" /><!--label for code--> <mlo:url /> <abstract /> <applicationProcedure href="http://www.poppleton.ac.uk/postgraduate/courses/how-to-apply/" /> <mlo:assessment /> <learningOutcome /> <mlo:objective /> <mlo:prerequisite /> <regulations href="www.poppleton.ac.uk/regulations"/> <mlo:qualification> <dc:identifier /><!--alpha-numeric id--> <dc:title /> <abbr /> <dc:description /> <dcterms:educationLevel /> <mlo:url /> <awardedBy /> </mlo:qualification> <mlo:credit> <credit:scheme /> <credit:level /><!--code--> <credit:value /><!--value--> </mlo:credit> <presentation> <dc:identifier /><!--url--> <dc:title /> <mlo:start dtf="2015-09-01" /><!--text equiv--> <end dtf="2017-07-01" /><!--text equiv--> <mlo:duration interval="P2Y" /><!--text equiv--> <applyFrom dtf="2014-09" /><!--text equiv--> <applyUntil dtf="2015-09" /><!--text equiv--> <applyTo /><!--url--> <studyMode identifier="PT" /><!--label--> <!-- Note: in the absence of attendanceMode, consumers can assume that it is Campus, so the attendanceMode can be omitted --> <attendanceMode identifier="CM" /><!--label--> <attendancePattern identifier="DT" /><!--label--> <mlo:languageOfInstruction /><!--iso 639-2 code--> <languageOfAssessment /><!--iso 639-2 code--> <mlo:places /><!--iso 639-2 code--> <mlo:cost /><!--free text description--> <!-- Note: in the absence of venue, consumers can assume that it is as per the main provider element, so the venue can be omitted --> <venue> <provider> <dc:identifier /><!--label--> <dc:title /> <mlo:location> <mlo:town /> <mlo:postcode /> <mlo:address /> <mlo:address /> <mlo:phone /> <mlo:email /> </mlo:location> </provider> </venue> </presentation> </course> <mlo:location> <mlo:town /> <mlo:postcode /> <mlo:address /> <mlo:address /> <!-- international convention also acceptable: +44 (0) 800 666 9999 --> <mlo:phone /> <mlo:fax /> <mlo:email /> </mlo:location> </provider> </catalog>
Working through this from the top (root) down (up?):
In XCRI catalog is the root element for a list of courses, the Google Developer guidance for describing course lists suggest schema.org/ItemList is a good equivalent. The catalog element has an @generated attribute which is the date on which the catalog content was generated, it also has sub elements of description and contributor. I haven’t implemented this yet, but they could be translated if the schema.org course list is double typed as an ItemList and a CreativeWork. In schema.org the relationship between the course list / catalog and the Courses is provided by the itemListElement property of the ItemList. This expects a value which is a ListItem, and so we need to double type the course entities in schema.org as a ListItem and Course.
In XCRI XML the relationship between courses and the organizations that provide them is expressed by nesting the a course element nested inside a provider element. In schema.org we use the provider property that Course inherits from CreativeWork. The information about the provider, i.e. description, title, parts, location (as a postal address), identifier, url mostly have obvious counterparts in schema.org/Organization (i.e. description, name, subOrganization (not implemented), address (as PostalAddress) and url). Identifiers take a bit of thought, see below.
XCRI makes the distinction between Course as a thing which may be offered at different times and places, and Presentation as an offering or instantiation of a course. This is the same as the distinction between schema.org/Course and schema.org/CourseInstance. So the course elements in XCRI map directly to schema.org/Course entities.
The subelements of xcri:course that map clearly to schema.org properties of Course are: title (maps to name), url, abstract (maps to description), subject (maps to about, and, when an identifier from suitable framework is specified, an educational subject alignment) and mlo:prerequiste (maps to coursePrerequisites). Identifiers take a bit of thought, see below, but if the identifier was not an http URI I took a punt at it being the schema.org/courseCode (I am especially sure of this if it had the internalID attribute).
As well as abstract there were other descriptions in the XCRI feed, formatted in XHTML giving marketing information. These I passed over, but (stripped of the formatting) they could be used as descriptions, especially if the abstract is absent.
XCRI Elements that I haven’t mapped yet are, isPartOf, dc:Type, applicationProcedure, mlo:assessment, mlo:objective, regulations, mlo:qualifications, and mlo:credit. The last two of these are known gaps in schema’s ability to describe courses; there might be mappings to some LRMI properties for some of the others in some circumstances. For example if the dc:Type is PG: Postgraduate, then this could be an alignment to some educational level.
Additionally, we use the provider property of schema.org/Course to link to the provider, a relationship that is conveyed in XCRI XML by nesting, as mentioned above.
The xcri:presentation maps to schema.org/CourseInstance, which is linked to from the Course by the hasCourseInstance property.
Elements of presentation which map directly to properties of CourseInstance are: title (maps to name), mlo:start (maps to startDate) end (maps to endDate), mlo:duration (maps to duration), studyMode, attendanceMode and attendance pattern (all mapped to courseMode). The venue element maps to CourseInstance’s location property, though the provider’s identifier turns up here in a way which requires a bit of thought, see below.
A number of other elements (namely cost, applyFrom, applyUntil and applyTo) can all be mapped to properties of a schema.org/Offer. mlo:cost maps to a description of a PriceSpecification (the costs for UKHE degrees are usually more complex than can be given with a single number/currency pair), the others map to availabilityStarts, availabilityEnds, availabilityAtOrFrom. This Offer is linked to from the CourseInstance’s offers property.
There are multiple identifiers in various formats in the XCRI XML input, and various required identifiers in the schema.org graph of the course information. As discussed above, some of the dc:identifiers provided were short alphanumeric codes, and were used as, for example, the value for schema.org/courseCode, or to identify an educational subject in an Alignment Object. There is also the mlo:url element, which I used for the schema.org/url property.
What I skipped over several times is that, as well as the mlo:url, similar (or identical) http URIs were used as values for dc:identifier. Also, as well as the schema.org url property, for linked data we need an identifier for the entities we are describing (the @id tag in JSON-LD), preferably an http URI. So, I decided to experiment with using the the dc:identifiers in XCRI XML as @id identifiers for the JSON-LD. This has an advantage over just using an arbitrary random identifier in that for larger data sets there is a chance of reducing repetition in the serialization of the graph. For example with luck many courses will share the same location, and so this could appear as a properly identified entity in the graph to which many Course Instance location properties link. I have experimented with different orders or preference for what to use for the @id, (e.g. 1. dc:identifier beginning with http, 2. mlo:url, 3. dc:identifier with text value). I immediately hit a snag with this, because the same http URI was being used for different things in the XCRI example, e.g. for course and presentation, or for institution and venue, and it troubled me that the URI I was using was actually the identifier for an Institutional web page. So to disambiguate I appended #<SchemaType> to the URI, e.g. http://example.org/course1#Course, http://example.org/course1#CourseInstance.
This is probably going to take more thinking about in the future.
Most of the above mapping is implemented (or with luck, soon will be) in a python script using xml.eTree and rdflib. You’re welcome to take a look at it on github, but please bear in mind that it is pretty much untested on any input other than the example file given above. It is certainly not production level code, so don’t use it as such.
The output in JSON-LD is pretty unreadable, so perhaps the most interesting way to view it is through the Google Structured Data Testing Tool. Ignore the errors and warnings, they arise from requirements for various Google products, not problems with the schema.org data.
- Broadly speaking, it works. With some exceptions that didn’t surprise me the XCRI-CAP data about a course can be represented as schema.org linked data.
- Credit and Qualifications seem to me to be the biggest gaps, relating to existing use cases from the schema course extension community group.
- Likewise there is a gap around how to represent aims and objectives in schema.org, which might be related to work on competencies.
- In several places there was coded information in the XCRI (e.g. UK Provider ID, course mode codes) which isn’t easy to represent in schema.org. But this issue is being worked on.
I’ll tidy up the code a bit, and I also want to test it more extensively. I’m also pondering putting the resulting JSON-LD into a graph database to test how well it can be queried. This would be a great test of whether the schema course extension project really did meet it’s use cases.
Do drop me a line if you have any ideas, or if you have any XCRI feeds (or similar data in another format) I could play with.