XCRI-CAP (eXchanging Course Related Information, Course Advertising Profile) is the UK standard for course marketing information in Higher Education. It is compatible with the European Standard Metadata for Learning Opportunities. The W3C schema course extension community group has developed terms for describing educational courses that are now part of schema.org. Here I look at translating the data from an XCRI-CAP xml feed to schema.org json-ld. Continue reading
This progress update on the work to extend schema.org to support the discovery of any type of educational course is cross-posted from the Schema Course Extension W3C Community Group. If you are interested in this work please head over there.
What aspects of a course can we now describe?
As a result of work so far addressing the use cases that we outlined, we now have answers to the following questions about how to describe courses using schema.org:
- How to define something as being about a course or being about an instance of that course
- How to mark up the identifier used by providers to identify their courses
- How to identify a course by provider and name
- How to mark up the subject of a course
- How to identify the location where a course is offered
- How to identify the start and end of an instance of a course, and the times of events that are part of it
- Related to start and end dates of a course, how to specify the duration and amount of time typically required to complete a course
- How to identify the organizations providing and offering courses (and how these two roles may differ).
- How to identify the teacher / instructor of a course (who may or nay not be the creator of the course).
- How to mark up the mode of study or delivery.
- How to identify a course which is a prerequisite of the course being described or to link to or describe other prerequisites.
As with anything in schema.org, many of the answers proposed are not the final word on all the detail required in every case, but they form a solid basis that I think will be adequate in many instances.
What new properties are we proposing?
In short, remarkably few. Many of the aspects of a course can be described in the same way as for other creative works or events. However we did find that we needed to create two new types Course and CourseInstance to identify whether the description related to a course that could be offered at various times or a specific offering or section of that course. We also found the need for three new properties for Course: courseCode, coursePrerequisites and hasCourseInstance; and two new properties for CourseInstance: courseMode and instructor.
There are others under discussion, but I highlight these as proposed because they are being put forward for inclusion in the next release of the schema.org core vocabulary.
I am chair of the Schema Course Extension W3C Community Group, which aims to develop an extension for schema.org concerning the discovery of any type of educational course. This progress update is cross-posted from there.
If the forming-storming-norming-performing model of group development still has any currency, then I am pretty sure that February was the “storming” phase. There was a lot of discussion, much of it around the modelling of the basic entities for describing courses and how they relate to core types in schema (the Modelling Course and CourseOffering & Course, a new dawn? threads). Pleased to say that the discussion did its job, and we achieved some sort of consensus (norming) around modelling courses in two parts
Course, a subtype of CreativeWork: A description of an educational course which may be offered as distinct instances at different times and places, or through different media or modes of study. An educational course is a sequence of one or more educational events and/or creative works which aims to build knowledge, competence or ability of learners.
CourseInstance, a subtype of Event: An instance of a Course offered at a specific time and place or through specific media or mode of study or to a specific section of students.
hasCourseInstance, a property of Course with expected range CourseInstance: An offering of the course at a specific time and place or through specific media or mode of study or to a specific section of students.
(see Modelling Course and CourseInstance on the group wiki)
This modelling, especially the subtyping from existing schema.org types allows us to meet many of the requirements arising from the use cases quite simply. For example, the cost of a course instance can be provided using the offers property of schema.org/Event.
The wiki is working to a reasonable extent as a place to record the outcomes of the discussion. Working from the outline use cases page you can see which requirements have pages, and those pages that exist point to the relevant discussion threads in the mail list and, where we have got this far, describe the current solution. The wiki is also the place to find examples for testing whether the proposed solution can be used to mark up real course information.
The next phase of the work should see us performing, working through the requirements from the use cases and showing how they can be me. I think we should focus first on those that look easy to do with existing properties of schema.org/Event and schema.org/CreativeWork.
UPDATE: there is a new W3C community group schema course extend set up to progress these ideas. Please join if you are interested.
This is essentially an invite to get involved with building a schema extension for educational courses, by way of a description of work so far. If you want to reply it’s sent as an email schema.org mail list.
About a year ago there was a flurry of discussion about wanting to markup descriptions of courses in schema. Vicky Tardiff-Holland produced a proposal which we discussed in LRMI and elsewhere as a result of which various suggestions were and comments were added to that proposal.
I also led some work in LRMI around scope, use cases, requirements, existing data; which I hope will lead to validating/refining the proposal by some example data that could be used to demonstrate that it met the use cases.
I am up for another push on courses. I share the doc I was working on in the hope that it is good starting point. It’s a bit long, so here is an overview of what it contains:
- scope: concerning discovery of any type of educational course (online/offline, long/short, scheduled/on-demand) Educational course defined as “some sequence of events and/or creative works which aims to build knowledge, competence or ability of learners”. (out of scope: information about students and their progression etc; information needed internally for course management rather than discovery)
- comparators: a review of some established ways of sharing similar data
- use cases
- requirements arising from the use cases
- mapping to some existing examples. I used hypothes.is to annotate existing web pages that describe different types of course, e.g. from Coursera or a University, tagging the requirement that the data was relevant to. Here’s an example of a page as tagged (click on a yellow highlight to show the relevant requirement as a comment with a tag)
hypothes.is aggregates the selected information for each tag, to give a list of the information relevant to each use case, for example cost
I think the next step would be to review the use cases and requirements in light of some of the observations from the mapping, and to look again at the proposal to see how it reflects the data available/required. But first I want to try to get more people involved, see whether anyone has a better idea for how to progress, or if anyone wants to check the work so far and help move it forward.
Finally, I’m aware the docs and discussions so far around schema for courses are a scattered set of scraps and drafts. If there is enough interest it would be really useful to have it in one place.
Alasdair Gray and I have had Anna Grant working with us for the last 12 weeks on an Equate Scotland Technology Placement project looking at how we can represent course information as linked data. As I wrote at the beginning of the project, for me this was of interest in relation to work on the use of schema.org to describe courses; for the department as a whole it relates to how we can access course-related information internally to view information such as the articulation of related learning outcomes from courses at different stages of a programme, and how data could be published and linked to datasets from other organisations such as accrediting bodies or funders. We avoided any student-related data such as enrolments and grades. The objectives for Anna’s work were ambitious: survey existing HE open data and ontologies in use; design an ontology that we can use; develop an interface we can use to create and publish our course data. Anna made great progress on all three fronts. Most of what follows is lifted from her report.
(Aside: at HW we run 4-year programmes in computer science which are composed of courses; I know many other institution run 3/4-year courses which are comprised of modules. Talking more generally, course is usefully ambiguous to cover both levels of granularity; programme and module seem unambiguous.)
A few Universities have already embarked on a similar projects, notably the Open University, Oxford University and Southampton University in the UK, and Muenster and the American University of Beirut elsewhere. Southampton was one of the first Universities to take the open linked data approached and as such they developed their own bespoke ontology. Oxford has predominantly used the XCRI ontology (see below for information on the standard education ontologies mentioned here) to represent data, additionally they have used MLO, dcterms, skos and a few resource types that they have defined in their own ontology. The Open University has the richest data available, the approach they took was to use many ontologies. Muenster developed the TEACH ontology, and the American University of Beirut used the CourseWare and AIISO ontologies.
Aiiso (Academic Institution Internal Structure Ontology) is an excellent ontology for what it is designed for but as it says, it aims to describe the structure of an institution and doesn’t offer a huge amount in the way of particular properties of a course. Teach is a better fit in terms of having the kind of properties that we wished to use to describe a course, however doesn’t give any kind of representation of the provider of the course. CourseWare is a simple ontology with only four classes and many properties with with Course as the domain, the trouble with this ontology is that it is closely related to the Aktors ontology which is no longer defined anywhere online.
XCRI and MLO are designed for the advertising of courses and as such they miss out some of the features of a course that would be represented in internal course descriptions such as assessment method and learning outcomes. Neither of these ontologies show the difference between a programme and a module. ECIM is an extension of MLO which provides a common format for representing credits awarded for completion of a learning opportunity.
CEDS (Common Education Data Standards) is an American ontology which provides a shared vocabulary for educational data from preschool right up to adult education. The benefits of which are, that data can be compared and exchanged in a consistent way. It has data domains for assessment, learning standards, learning resources, authentication and authorisation. Additionally it provides domains for different stages of education e.g. post-secondary education. CEDS is ambitious in that it represents all levels of education and as such is a very complex and detailed ontology.
XCRI, MLO (+ ECIM) and CEDS can be grouped together in that they differentiate between a course specification and a course instance, offering or section. The specification being the parts of a course that remain consistent from one presentation to the next, whereas the instance defines those aspects of a course that vary between presentations for example location or start date. The advantage of this is that there will be a smaller amount of data that will require updating between years/offerings.
An initial draft of a Heriot Watt schema applying all the ontologies available was made. It was a mess, however it became apparent the MLO was the predominant ontology. So we chose to use MLO where possible and then use other ontologies where required. This iteration resulted in a course instance becoming both an MLO learning opportunity instance and a TEACH course in order to be able to use all the properties required. Even using this mix of ontologies we still needed to mint our own terms. This approach was a bit complex and TEACH does not seem to be widely used, we therefore decided to use MLO alone and extend it to fit our data in a similar way that already started by ECIM.
The final draft is shown below. Key: Green= MLO, Purple=MLO extension, Blue=ECIM / previous alteration to MLO Yellow= generic ontologies such as Dublin core and SKOS. In brief, we used subtypes of MLO Learning Opportunities to describe both programmes and modules. The distinction between information that is at the course specification level and that which is at the course instance level was made on the basis of whether changing the information required committee approval. So things that can be changed year on year without approval such as location, course leader and other teaching staff are associated with course instance; things that are more stable and require approval such as syllabus, learning outcomes, assessment methods are at course specification level.
We also created some instance data for Computer Science courses at Heriot-Watt. For this we use Semantic MediaWiki (with the Semantic bundle). Semantic forms were used for inputting course information, the input from the forms is then shown as a wiki page. Categories in mediawiki are akin to classes, properties are used to link one page to another and also to relate the subject of the page to its associated literals. An input form has the properties inbuilt such that each field in the form has a property related to it. Essentially the item described by the form will become the object in the stored triple, the property associated with a field within the form will form the predicate of the stored triple and the input to the field will form the subject of a triple. A field can be set such that multiple values can be entered if separated by commas, and in this case a triple will be formed for each value. I think there is a useful piece of work that could be done comparing various tools for creating linked data (e.g. Semantic MediaWiki, Calimachus, Marmotta) and evaluating whether other approaches (e.g. WordPress extensions) may improve on them. If you know of anything along such lines please let me know.
We have little more work to do in representing the ontology in Protege and creating more instance data, watch this space for updates and a more detailed description than the image above. We would also like to evaluate the ontology more fully against potential use cases and against other institutions data.
Anna has finished her work here now and returns to Edinburgh Napier University to finish her Master’s project. Alasdair and I think she has done a really impressive job, not least considering she had no previous experience with RDF and semantic technologies. We’ve also found her a pleasure to work with and would like to thank her for her efforts on this project.