A short project on linking course data

Alasdair Gray and I have had Anna Grant working with us for the last 12 weeks on an Equate Scotland Technology Placement project looking at how we can represent course information as linked data. As I wrote at the beginning of the project, for me this was of interest in relation to work on the use of schema.org to describe courses; for the department as a whole it relates to how we can access course-related information internally to view information such as the articulation of related learning outcomes from courses at different stages of a programme, and how data could be published and linked to datasets from other organisations such as accrediting bodies or funders. We avoided any student-related data such as enrolments and grades. The objectives for Anna’s work were ambitious: survey existing HE open data and ontologies in use; design an ontology that we can use; develop an interface we can use to create and publish our course data. Anna made great progress on all three fronts. Most of what follows is lifted from her report.

(Aside: at HW we run 4-year programmes in computer science which are composed of courses; I know many other institution run 3/4-year courses which are comprised of modules. Talking more generally, course is usefully ambiguous to cover both levels of granularity; programme and module seem unambiguous.)

A few Universities have already embarked on a similar projects, notably the Open University, Oxford University and Southampton University in the UK, and Muenster and the American University of Beirut elsewhere. Southampton was one of the first Universities to take the open linked data approached and as such they developed their own bespoke ontology. Oxford has predominantly used the XCRI ontology (see below for information on the standard education ontologies mentioned here) to represent data, additionally they have used MLO, dcterms, skos and a few resource types that they have defined in their own ontology. The Open University has the richest data available, the approach they took was to use many ontologies. Muenster developed the TEACH ontology, and the American University of Beirut used the CourseWare and AIISO ontologies.

The ontologies reviewed were: AIISO, Teach, CourseWare, XCRI, MLO, ECIM and CEDS. A live working draft of the summary / review for these is available for comment as a Google Doc.

Aiiso (Academic Institution Internal Structure Ontology) is an excellent ontology for what it is designed for but as it says, it aims to describe the structure of an institution and doesn’t offer a huge amount in the way of particular properties of a course. Teach is a better fit in terms of having the kind of properties that we wished to use to describe a course, however doesn’t give any kind of representation of the provider of the course. CourseWare is a simple ontology with only four classes and many properties with with Course as the domain, the trouble with this ontology is that it is closely related to the Aktors ontology which is no longer defined anywhere online.

XCRI and MLO are designed for the advertising of courses and as such they miss out some of the features of a course that would be represented in internal course descriptions such as assessment method and learning outcomes.  Neither of these ontologies show the difference between a programme and a module. ECIM is an extension of MLO which provides a common format for representing credits awarded for completion of a learning opportunity.

CEDS (Common Education Data Standards) is an American ontology which provides a shared vocabulary for educational data from preschool right up to adult education.  The benefits of which are, that data can be compared and exchanged in a consistent way.  It has data domains for assessment, learning standards, learning resources, authentication and authorisation.  Additionally it provides domains for different stages of education e.g. post-secondary education. CEDS is ambitious in that it represents all levels of education and as such is a very complex and detailed ontology.

XCRI, MLO (+ ECIM) and CEDS can be grouped together in that they differentiate between a course specification and a course instance, offering or section.  The specification being the parts of a course that remain consistent from one presentation to the next, whereas the instance defines those aspects of a course that vary between presentations for example location or start date. The advantage of this is that there will be a smaller amount of data that will require updating between years/offerings.

An initial draft of a Heriot Watt schema applying all the ontologies available was made. It was a mess, however it became apparent the MLO was the predominant ontology.  So we chose to use MLO where possible and then use other ontologies where required.  This iteration  resulted in a course instance becoming both an MLO learning opportunity instance and a TEACH course in order to be able to use all the properties required.  Even using this mix of ontologies we still needed to mint our own terms.  This approach was a bit complex and TEACH does not seem to be widely used, we therefore decided to use MLO alone and extend it to fit our data in a similar way that already started by ECIM.

The final draft is shown below. Key:  Green= MLO, Purple=MLO extension, Blue=ECIM / previous alteration to MLO Yellow= generic ontologies such as Dublin core and SKOS.  In brief, we used subtypes of MLO Learning Opportunities to describe both programmes and modules. The distinction between information that is at the course specification level and that which is at the course instance level was made on the basis of whether changing the information required committee approval. So things that can be changed year on year without approval such as location, course leader and other teaching staff are associated with course instance; things that are more stable and require approval such as syllabus, learning outcomes, assessment methods are at course specification level.


We also created some instance data for Computer Science courses at Heriot-Watt. For this we use Semantic MediaWiki (with the Semantic bundle). Semantic forms were used for inputting course information, the input from the forms is then shown as a wiki page. Categories in mediawiki are akin to classes, properties are used to link one page to another and also to relate the subject of the page to its associated literals. An input form has the properties inbuilt such that each field in the form has a property related to it. Essentially the item described by the form will become the object in the stored triple, the property associated with a field within the form will form the predicate of the stored triple and the input to the field will form the subject of a triple. A field can be set such that multiple values can be entered if separated by commas, and in this case a triple will be formed for each value.  I think there is a useful piece of work that could be done comparing various tools for creating linked data (e.g. Semantic MediaWiki, Calimachus, Marmotta) and evaluating whether other approaches (e.g. WordPress extensions) may improve on them. If you know of anything along such lines please let me know.

We have little more work to do in representing the ontology in Protege and creating more instance data, watch this space for updates and a more detailed description than the image above. We would also like to evaluate the ontology more fully against potential use cases and against other institutions data.

Anna has finished her work here now and returns to Edinburgh Napier University to finish her Master’s project. Alasdair and I think she has done a really impressive job, not least considering she had no previous experience with RDF and semantic technologies. We’ve also found her a pleasure to work with and would like to thank her for her efforts on this project.

10 thoughts on “A short project on linking course data

  1. I question your use of inSemester. The semester is a pretty arbitrary concept which along with ‘term’ and ‘academic year’ are used in different ways by schools, universities and other learning providers.

    Although they may have very clear start & end dates within an institution, semesters & terms don’t matter much beyond the institution. Internally they have a identity separate to their dates. eg. 2015-16/semester2. They may or may not be linked to academic years of the organistation, and of the education system that the organisation resides in. While the UK & Australia both have a 2016-17 academic year the actual start & end dates may be wildly different.

    I can’t come up with a great name for it, but a NamedEducationTimespan would be the super-set of term, semester, year. They would be defined by an organisation, and could have sub-timespans and also be more specific versions of another timespan. eg. the UK acadmic year is from the end of summer one year, to the start of summer the next. More or less. But the University of Reading academic year has a clearly defined start & end date.

    I’ve done some initial work around this here: http://academic-session.data.ac.uk/ but clearly more could be done.

    1. Thanks Chris, that’s a fair point. The terminology reflects what we use at Heriot-Watt in, for example, course documentation and student handbooks, but even internally some of the issues you mention arise when we offer courses at international campuses or through partner organisations. It probably requires a little more reflection on why that information is required: the obvious case of “when will this happen” can be met by a start date and end date; more subtle reasons, I think, are related to simplifying the relation between courses related activities such as exam diets and board meetings, important for our local use of the data but not so interesting to others.

      1. My early investigations into this led me to want to start adding

        At Southampton we can have as many as 4 jargon terms for the same concept used by the different groups. Part of this comes from using an USA-made database which muddled names further and occasional attempts to standardise met with the usual academic intellectual judo rolls to impositions of power.

      2. Another thing that might be out of scope is that we bundle our modules (aka courses) and programme themes (aka programmes) into sets by subject, with someone overseeing the entire set. Students doing a degree will *mostly* study modules within the same subject, but many subject take a COMP(uter science) or MATH(s) module or two. It’s possible, probable, that this might be beyond the scope, but I still think it would be useful to define a standard way to group things together so two organistaions could have more or less compatible data. It may be relevant if we choose to document the course leader or committee in the linked open data.

  2. Some other quick thoughts…

    You should allow for more relationships between people and the course instance. We have a moderator role which is a 5% loading to moderate the module.

    Accrediting organisation is usually by academic year or date. A course can lose its accreditation but was still accredited at earlier points. I’d suggest consdering moving/adding the isAcreditedBy onto the programme instance.

    We run the same module for a number of years, and can run it more than once in an academic year. Each year we run it it’s likely to have updated content, but the general course ID serves for prerequisites. I would say, using your model, that each module has a new learning spec each year, but are still part of a series of similar modules with the same code which can all be treated as the same for purposes of pre-requisite but not for some other purposes.

    It seems odd to have a course start but no end.

    You need to add “Labs” to your assessment methods. I suspect subjects like medicine & archaology may add some additional. Labs tend to be small and frequent. For some just attending satisfactorialy is all that’s requred to gain the credit.

    On some degrees (sociology?) students gain a fraction of their credits by taking part in the trials run by other students. This doesn’t fit into your model (yet).

    We record key resources needed by each module:
    – Core textbook
    – Background textbook
    – Journals
    – Other library support required
    – Staff requirements (including teaching assistants and demonstrators)
    – Teaching space, layout and equipment required
    – Laboratory space and equipment required
    – Computer requirements
    – Software requirements
    – On-line resources
    – Other resource requirements
    Plus ISBN & Software version (applicable to some options only)

    We also capture breakdowns of hours students are expected to spend in each of: Lecture, Seminar, Tutorial, Computer Lab, Specialist Lab, Project supervision, Fieldwork, Demonstration or Examples Session and the maximum group size. I don’t know quite how this is used, it’s something to do with KIS.

    Please understand that this brainstorm is intend to help. You’ve already done something very useful. I would strongly advise that some easy to understand examples and maybe something that reads and reports in human-readable what’s in the RDF, would go a long way to help people implement. I suspect the code we use for http://opd.data.ac.uk/checker might be useful, and is of course all githubbed.

    1. “Please understand that this brainstorm is intend to help. ” That is very clear, Chris, and just what I asked for, just what I wanted. There’s a lot to think about, so please forgive some cherry picking in my replies. All your comments are appreciated.

      “Accrediting organisation is usually by academic year or date. ” Yes we gave that some thought at the end of the project but didn’t have time to work it through. One approach would be to create a class for Accreditation, instances of which would have a start and end date.

      “It seems odd to have a course start but no end” Err, yes. it does seem odd.

      “We record key resources needed by each module” Fouad Zablith has done similar at the American University of Beirut, see http://www.www2015.it/documents/proceedings/companion/p711.pdf . We didn’t quite get that far, but we have it in mind.

  3. I think that I have some concerns around the use of “spec” and “instance”. My concern is related to a long running programme or course which will change “spec” many times and will have zero or more instances of a given spec. (zero is possible if not enough students sign up…)

    I would suggest that there is a logical Programme Series & Course Series where the spec. has altered but the “brand” of the course remains. We still teach BSc Computer Science but it ain’t the same spec. as what I studied in ’94-97! This would also mean you could consider adding supercedes/supercededBy to course and programme specs. The challenge is adding richness like this without making it too much hassle to do the basics.

    Also, exams may have dates. Courseworks may have set-date and handin-by-dates.

    It’s possible that there’s some more to do around awards. Awards are a join between an organisation, a person, an award type and possibly a subject (Economics) and grade (3rd degree honours). The organisation giving the award is not automatically the same as the learning opportunity provider. For example A-Levels are taught by schools but the actual awards come from a small set of A-Level providing organisations.

    You have pre-requisite but not co-requisite. On some rare cases you may have options where you can take neither, X, or X and Y, but you can’t take Y without X even though they are taught side-by-side.

    Also pre-requisites are tricksy irksome beasts, which may have boolean logic, and every requirement has an implicit “or equivalent” for transfer students etc. This may be worth just ignoring as it’s a rathole. (Looks like a rabbit hole, but leads to a ratsnest…)

    1. “I have some concerns around the use of “spec” and “instance”. My concern is related to a long running programme or course which will change “spec” many times” — there should indeed be a dct:replaces / dct:isReplacedBy relationship to link course specs which change. It was this or add another level above course spec to be that thing which has the same code even though the spec changes.

      “You have pre-requisite but not co-requisite” we call co-requisite courses “synoptic” so it is there but obscure.

      “Also pre-requisites are tricksy irksome beasts” Yup. The pre-requisites for a course should be learning objectives, but everyone uses some course the provide as a shorthand for that. It’s a case where reality on the ground should change for both educational and modelling reasons.

  4. We define some semantics into the course/programme hasPart relationship.

    Some modules may be compulsory for a part of a programme.

    Some modules are optional (options get complicated, but if it ain’t compulsory it’s some-kind-of optional)

    Modules may be “core” which means they must be passed to pass the part. Modules can on rare occasions be “core”+”optional”. eg. if you had to take one of two maths modules, but must pass the one you take.

    Modules may be “specialised” for a part/programme. This means that although the main database might treat them as normal options, the programme leader designates these as the featured modules for the course. This happens on a few of our courses, eg. http://www.ecs.soton.ac.uk/programmes/h680-meng-electronic-engineering-photonics#modules (year 3)

    Modules may be “recommneded” for a part/programme. This is a slightly more relaxed version of the specialised flag. We allow our students on some programmes to select their optional modules from both their programme and a range of 30 general interest modules, but it’s important for students to see which are more relevant to their primary degree. Added the “recommended” flag helps people navigate the data between the 40 options, and the 5-10 that are key for that part.

    I also notice your model assumes that a programme has parts. This is true for a university, but should keep in mind the degenerate case of a simple training course where the programme has/is a single course with an exam and award.

    1. Yes, I think a lot of this would go into the stage specification, which I think is currently a bit under-defined. It was introduced because the level of the course did not fully describe the academic stage at which it could be taken by students but was a little neglected after that. There should at least be optional and mandatory varieties of hasPart.

Comments are closed.