Global micro-credential mapping project report

A project I worked on with Credential Engine has just had its (first?) report published: Global Micro-Credential Schema Mapping: A Vital Step Towards Interoperability and Mobility.

This project was suggested by the Credential Engine‘s CTDL Advisory Group, and ran from January to June this year. That was slightly longer than its initial 3 month estimate, but we covered more than we initially expected. The intended benefits were outlined by the CTDL Advisory group, and centre on making sure that micro-credentials issued in one jurisdiction are understandable in others, even when different data specifications have to be used in order to comply with local technical and political requirements and practices where they are issued. The end result envisaged is that individuals can have their achievements recognized globally.

We used the Data Ecosystem Mapping Tool to map elements from various specifications and standards related micro-credentials, such as CTDL, Open Badges, the versions of Open Badges used by a commercial badge issuer in Canada and Australia, W3C Verifiable Credentials and the European Learning Model: more information on those and specs and who I mean by “we” are in the report.

The results are available on the Credential Engines DESM site where you can see the degree of semantic alignment between these schemas, and there are some reflections on the results in the report.

New resources explaining the Data Ecosystem Schema Mapping tool

The Data Ecosystem Schema Mapping (DESM) tool is one of the projects that I am working on for the US Chamber of Commerce Foundation’s T3 Innovation Network.  DESM is a specialized tool for creating, editing, maintaining and viewing crosswalks between data models, these crosswalks are based on the degree of semantic alignment between terms in the different schemas. Colleagues on the project have produced two one-page fliers about DESM that have just been published: one explaining what DESM is and how it works, the other providing guidance on mapping projects that use DESM.

Watch this space for more about our use of DESM in both T3 and Credential Engine projects.

Credential Engine publishes rubrics

A few months back I helped Credential Engine extend the Credential Transparency Description Language (CTDL) to cover Rubrics. They are have now published the first batch of rubrics in the Credential Finder for viewing on the web and as linked data.

From the top, a heading on a dark blue background with the rubric title and the name of the organization that created it. Below: left, descriptive text in a blue box; right information about the creator of the rubric and what it was created for. Below these, a black "tear" across the page indicates a discontinuity. Below that a table showing the rubric criteria (rows), levels of attainment (columns) and in each cell a description of the expect performance for that criterion-level.
Rubric information as displayed by the credential finder, edited to remove the middle of a long page.

Rubrics are or course useful when marking assessments, but transparency of rubrics is important in describing educational attainment because if you don’t know what criteria were used in assessing a skill then you don’t know whether an assertion of some level of proficiency in that skill is sufficient for the task you have in mind. This matters to anyone learning (or thinking of learning) a new skill, or applying for a job, and to employers looking to hire someone.

From early last year Credential Engine ran a task group for Rubrics which, as task groups do for any major update to CTDL, looked at use cases, existing practice, data models(*) and how they all related to what was already in CTDL before proposing new terms for the description of rubrics. (* Incidentally, as part of this Stuart Sutton used the Data Ecosystem Schema Mapping tool (DESM) to create a mapping of existing Rubric standards, available from the Credential Engine DESM page, select Rubrics). The outcome was the ability to describe in detail rubrics, the criteria they use and the levels expected against those criteria. You can also relate rubrics (and their criteria) to credentials, assessments, learning opportunities, tasks, jobs, occupations and industries, and provide information about who created the rubric and what for. This is described in the relevant section of the CTDL Handbooks.

And now there are the first 25 rubrics in the registry. You can access them through the Credential Finder, which as well as having the descriptive information for the rubrics as a whole has all the details of the criteria used. I hope this will aid discovery and reuse of the best rubrics, and that the availability as linked data (warning, raw JSON-LD file for applications and coders) will bring clarity to assertions made in credentials and job requirements. In future maybe learning outcome descriptions, credentials and job adverts will be able to be more precise about what is meant by “ability to weave baskets”.

Some recent Work (Expression, Manifestation, Item)

I blame John. He got me interested in FRBR, and long ago he helped me with a slightly mad attempt at FRBRizing Learning Resources. Of course FRBR is for Bibliographic Records, isn’t it? and according to several people I respect it doesn’t work (though other people whom I respect equally say that it does). Personally I always struggled around the expression/manifestation distinction for many types of resource, and always wanted it to play more nicely with the resource/representation approach of the WWW Architecture. But I did keep coming back to it when trying to explain the need to be clear about what exactly was being described in RDF, for example. If you’ve heard me go off on one about Romeo and Juliet, and the play-on-the-stage vs play-on-the-page, or the difference between novels and books, then you’ll know what I mean. So that’s why I got involved in the W3C OpenWEMI working group, certainly I didn’t contribute any expertise on WEMI that wasn’t already covered, but I hope I helped with some of the RDF stuff because I’ve certainly learnt a lot, and now:

Dublin Core announces openWEMI for community review

openWEMI is an RDF vocabulary based on the concepts of Work, Expression, Manifestation, and Item (WEMI) that were first introduced in the Functional Requirements for Bibliographic Records (FRBR) document produced by a working group of the International Federation of Library Associations (IFLA). That work and subsequent versions form the theoretical basis for library catalog metadata.

This DCMI work product defines a minimally constrained set of classes and properties that can be used in a variety of contexts. Unlike the IFLA work, openWEMI elements are purposely defined without reference to library catalog functions….

See the news item on the Dublin Core website for more information about how you can comment on the work.

Kudos to Karen Coyle for leading us through this work, and thanks to all the other working group members.

PESC Transcript in JSON-LD

I was (virtually) at the PESC 2023 Data Summit recently, presenting on a panel about Re-Engineering the “All-Or-Nothing” Academic Transcript to Reveal Its Unequivocal Value. This post is based on that presentation. It sets out the journey of a working group that I have been on, taking an XML record-based approach to detailing a student’s progress through education and turning it into JSON-LD. My input has been to focus on the Linked Data / RDF standards aspects of that. I should note that initially the project was about a transition to JSON, the emphasis on Linked Data and the model we’re adopting dropped out of our discussions about what we wanted to achieve and what various models enabled. We didn’t start with this solution in mind and try to jemmy it in. I think that’s important.

What we started with

We started with the PESC transcript in XML, and translated this to JSON.  XML gives you records with nested elements with meaning that depends on their parent & sibling elements. The image below is incomplete and very simplified and I have taken some liberties to make it presentable, but it gets the idea across.Nested boxes of information, like a form. The outer box is College Transcript. Within that are boxes for Transmission data and Student, each containing further boxes.

Presented like this you can see how the nested structure is kind of reassuringly similar to a printed transcript. Take note of how in the bottom right the information under “course” mixes information that is true of the course that everyone took (Course Number, Name) and information that is specific to the person whose details are provided nested in the same Student element. In reality the Course Number and Name have nothing to do with the Student.

JSON can be similar, but for Linked Data–as in JSON-LD–you no longer have a “record” paradigm, you have sets of statements about different things and the statements describe the characteristics of those things and the relationships between them.

The first cut

We took a first trip through the data model, focusing what were the different things being described and how did they relate to each other. The image below illustrates (again a very simple) example of what we came up with.

This usefully shows that there are some things that “belong” to the transcript, some things that are about a Person and what they have done, and some things that are about an Organization (e.g. a College) and what they offer (Courses).

But, when you look at real world examples, it was actually a bit of a mess: the relationships were more about where you find things on a printed transcript than what makes sense as relationships between the things being described. Look how you go from AcadmicSummary to Academic Session to Academic Achievement to Course.

Where we are now

We started thinking of a transcript as a record of a person’s various types of achievements in programs/courses/sessions/etc offered by an organization. That looks like this.

It looks a little less simplified, but it’s showing two achievements.

See how there is a split between the personal private information (yellow and blue boxes on the middle left) and the corporate public data (pink boxes on right and at the top) typically found in a Course Catalogue or similar, the type of information that could be made available as linked data through the Credential Engine Registry (other registries could exist, but Credential Engine pay me)

What if there were no need to generate the corporate data just for the transcript because it could be reused (either by repeating it in the transcript data or just linking to it.)

One final thought on the data structure. The heart of this view of the transcript is a series of assertions that an organization issues saying that a person has achieved something. These are represented by the Person-Achievement-College boxes running diagonally from botton left to the to right. This is the same core as a W3C Verifiable Credential, and the transcript is the same structure as a Verifiable Presentation. What if the data structure of the transcript were the same as that of a Verifiable Presentation? That is the approach taken by other similar standards, such as the European Learner Model and 1EdTech’s Comprehensive Learner Record. Having a common data model (even without going the whole way into including signed VCs in the transcript) will surely be helpful. If it is compatible with having a transcript made up of VCs, then so much the better, but we shall continue to follow the use cases to find a solution, not the other way round.

 

 

Fruitful RDF vocabularies are like cherries not bananas

The SEMIC style guide for semantic engineers is a very interesting document this is currently in review. It is part of a set of actions with the aim:

to promote semantic interoperability amongst the EU Member States, with the objective of fostering the use of standards by, for example, offering guidelines and expert advice on semantic interoperability for public administrations.

I am interested in the re-usability of RDF vocabularies: what is it that makes it easy or hard to take terms from an existing vocabulary and use them when creating a schema for describing something. That seems to me to be important for interoperability in a world where we know that there is going to be more than one standard for any domain and where we also know that no domain is entirely isolated from its neighbours which will have their own native standards. We need data at an atomic level that can persist through expressions conformant with different standards, and that is easiest if the different standards share terms where possible.

This idea of reusing terms from vocabularies is core to the idea of application profiles, and to the way that we conceive Dublin Core and LRMI terms being used, and indeed the way they are being used in the IEEE P2881 Learning Metadata work. One well known example of an application profile that reuses terms from Dublin Core is the Data Cataloguing Vocabulary DCAT which uses terms from Dublin Core, ODRL, FOAF, prov, and few terms created specifically for DCAT to describe entities and relationships according to its own conceptual domain model.

The SEMIC guidelines have lots of interesting things to say about reuse, including rules on what you can and cannot do when taking a terms from existing vocabularies in various ways including reuse “as-is”, reuse with “terminological adaptations” and reuse with “semantic adaptations”. I read these with special interest as I have written similar guidelines for Credential Engine’s Borrowed Terms policy. I am pleased to say we came to the same conclusion (phew). In discussion with the SEMIC authors that conclusion was described as “don’t cherry-pick”, that is: when you borrow or reuse a term from a vocabulary you must comply with everything that the–let’s use the O word here–ontology in which it was defined says about it. That’s not just the textual definition of the term but all that is entailed by statements about domain, range, relationships with other properties and so on. If the original ontology defines, directly or indirectly, “familyName” as being the name of a real person, then don’t use it for fictional characters.

Oof.

“The problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.”

Joe Armstrong, Coders at Work

I want to cherry-pick. I am looking for cherries, not bananas! I am looking for “lightweight” ontologies, so if you want your terms to be widely reused please create them as such (and this is also in the SEMIC guidelines). Define terms in such a way that they are free from too much baggage: you can always add that later, if you need it, but others can be left to add their own. One example of this is the way in which many Dublin Core Terms are defined without a rdfs:domain declaration. That is what allowed DCAT and others to use them with any class that fits their own domain model. Where ranges are declared for Dublin Core Terms they are deliberately broadly defined. The approach taken by schema.org in some ways goes further by not using rdfs:domain and rdfs:range but instead defining domainIncludes and rangeIncludes where the value is “a class that constitutes (one of) the expected type(s)” (my emphasis): by being non-exclusive domainIncludes and rangeIncludes provide a hint at what is expected without saying that you cannot use anything else.

If you’re interested in this line of thinking, then read the special section of Volume 41, Issue 4 of the Bulletin of the Association for Information Science and Technology on Linked Data and the Charm of Weak Semantics edited by Tom Baker and Stuart Sutton (2015).

 

What am I doing here? 3. Tabular Application Profiles

The third thing (of four) that I want to include in my review of what projects I am working on is the Dublin Core standard for Tabular Application Profiles. It’s another of my favourites, a volunteer effort under the DCMI Application Profiles Working Group.

Application profiles are a powerful concept, central to how RDF can be used to create new data models without creating entirely new standards. The DCTAP draft says this about them:

An application profile defines metadata usage for a specific application. Profiles are often created as texts that are intended for a human audience. These texts generally employ tables to list the elements of the profile and related rules for metadata creation and validation. Such a document is particularly useful in helping a community reach agreement on its needs and desired solutions. To be usable for a specific function these decisions then need to be translated to computer code, which may not be a straightforward task.

About the work

We have defined an (extensible) set of twelve elements that can be used as headers in a table defining an application profile, a primer showing how they can be used, and along the way we wrote a framework for talking about metadata and application profiles. We have also worked on various implementations and are set to create a cookbook showing how DC TAP can be used in real world applications. The primer is the best starting point for understanding the output as a whole.

The framework for talking about metadata came about because we were struggling to be clear when we used terms like property or entity. Does “property” refer to something in the application profile or in the base standard or in as used in some metadata instance or does it refer to a property of some thing in the world? In short we decided that the things being described have characteristics and relationship to each other which are asserted in RDF metadata using statements that have a predicates in them, those predicates reference properties that are part of a pre-defined vocabulary, and an application profile defines templates for how the property is used in statements to create descriptions. There is a similar string of suggestions for talking about entities, classes and shapes as well as some comments on what we found too confusing and so avoid talking about. With a little care you can use terms that are both familiar in context and not ambiguous.

About my role

This really is a team effort, expertly lead by Karen Coyle, and I just try to help. I will put my hand up as the literal minded pedant who needed a framework to make sure we all understood each other. Otherwise I have been treating this a side project that gives me an excuse to do some python programming: I have documented my TAP2SHACL and related scripts on this blog, which focus on taking a DCMI TAP and expressing it as SHACL that can be used to validate data instances. I have been using this on some other projects that I am involved in, notably the work with PESC looking at how they might move to JSON-LD.

What am I doing here? 2. Open Competencies Network

I am continuing my January review of the projects that I am working on with this post about my work on the Open Competencies Network (OCN). OCN is a part of the T3 Network of Networks, which is an initiative of US Chamber of Commerce Foundation aiming to explore “emerging technologies and standards in the talent marketplace to create more equitable and effective learning and career pathways.” Not surprisingly the Open Competencies Network (OCN) focuses on Competencies, but we understand that term broadly, including any “assertions of academic, professional, occupational, vocational and life goals, outcomes … for example knowledge, skills and abilities, capabilities, habits of mind, or habits of practice” (see the OCN competency explainer for more). I see competencies understood in this way as the link between my interests in learning, education, credentials and the world of employment and other activities. This builds on previous projects around Talent Marketplace Signalling, which I also did for the US Chamber of Commerce Foundation.

About the work

The OCN has two working groups: Advancing Open Competencies (AOC), which deals with outreach, community building, policy and governance issues, and the Technical Advisory Workgroup. My focus is on the latter. We have a couple of major technical projects, the Competency Explorer and the Data Ecosystem Standards Mapping (DESM) Tool, both of which probably deserve their own post at some time, but in brief:

Competency Explorer aims to make competency frameworks readily available to humans and machines by developing a membership trust network of open registries each holding one or more competency frameworks and enabling search and retrieval of those frameworks and their competencies from any registry node in the network.

DESM was developed to support data standards organizations—and the platforms and products that use those standards—in mapping, aligning and harmonizing data standards to promote data interoperability for the talent marketplace (and beyond). The DESM allows for data to move from a system or product using one data standards to another system or product that uses a different data standard.

Both of these projects deal with heterogeneous metadata, working around the theme of interoperability between metadata standards.

About my role

My friend and former colleague Shiela once described our work as “going to meetings and typing things”, which pretty much sums up the OCN work. The purpose is to contribute to the development of the projects, both of which were initiated by Stuart Sutton, whose shoes I am trying to fill in OCN.

For the Competency Explorer I have helped turn community gathered use cases into  features that can implemented to enhance the Explorer, and am currently one of the leads of an agile feature-driven development project with software developers at Learning Tapestry to implement as many of these features as possible and figure out what it would take to implement the others. I’m also working with data providers and Learning Tapestry to develop technical support around providing data for the Competency Explorer.

For DESM I helped develop the internal data schema used to represent the mapping between data standards, and am currently helping to support people who are using the tool to map a variety of standards in a pilot, or closed beta-testing. This has been a fascinating exercise in seeing a project through from a data model on paper, through working with programmers implementing it, to working with people as they try to use the tool developed from it.

What am I doing here? 1. Credential Engine

January seems like a good time to review the work that I am doing at the moment. Rather than try all of it in one post, I’ll take it one project at a time. This one is about my work with Credential Engine, a US-based not for profit that aims to provide information about educational and occupational credentials on offer and the things related to them. The aim is to empower learners with the information they need to make decisions about their educational options.

(Note, I think of educational / occupational credential as synonymous with qualification, and tend to use credential as a shorthand for that.)

About the work

I provide consultancy to Credential Engine on their RDF vocabulary, CTDL, the Credential Transparency Description Language. I’ve been associated with CTDL for longer than the Credential Engine has been around: I was on the technical advisory committee for the precursor project, CTI, the Credential Transparency Initiative, seven years ago.

[Aside, fun fact: the first job I had in learning technology was in another CTI, the Computers in Teaching Initiative. Yes this was a factor in my agreeing to serve on the advisory committee.]

The CTDL is key to Credential Engine’s mission to make credentials more transparent by providing more information about how to obtain them, for example who offers them, what competencies (knowledge, skills, attributes) are necessary to earn them, how are these competencies assessed, what opportunities are available to learn them, and what are the likely outcomes (in terms of things such as employability and salary) of possessing the credential. As such CTDL describes a lot more than just the bare details of a credential, it goes far beyond into organizations, competencies, learning opportunities and outcomes data. In fact, by CTDL we actually mean three related vocabularies:

  • CTDL, itself which covers the core of credentials, courses, pathways, organizations;
  • CTDL-ASN, an extension of the vocabulary for competency frameworks and related entities developed for the Achievement Standards Network;
  • QDATA, for quantitative data about outcomes.

As well as the bare vocabulary definitions we also provide a Handbook with sections for each of the vocabularies, covering how the terms are designed to be used to meet various use cases.

About my role

My first contract with Credential Engine was to set up and lead the EOCred W3C Community Group to improve and extend Schema.org’s capability to describe educational and occupational credentials. CTDL was created following the same model as schema.org, and Credential Engine were keen to keep the two languages harmonized. The outcome of that project was the schema.org EducationalOccupationalCredential class and related terms, and some documentation from the working group about the how to address the use cases we identified.

More recently I have been working more closely with the core Credential Registry team on developing CTDL. They have well-established policies and procedures  for updates, which include setting up open working groups to “socialize” major proposals. While I have been working with them we have produced major updates to address Credit Transfer, Education to Work relationships, Educational Pathways, Scheduling Patterns, Approval Lists, as well as many minor tweaks on an almost monthly basis. Coming soon: assessment rubrics.

One of the things that I really appreciate about the Credential Engine work is that it gives me the freedom (or lets me take the liberty) to explore new RDF-related technologies that might benefit CTDL. The best example of this is how I have been able to build some working knowledge of SHACL as part of our thinking on how we express the CTDL data model and applications profiles of it in such a way that data can be validated. This has helped me justify my (otherwise unfunded) contribution to the Dublin Core Tabular Application Profile work. Other examples come from wanting to make sure CTDL is used as widely as possible, include contributing to the W3C Verifiable Credentials for Education community group, PESC’s work on transcripts and ETF training events on linked data.

Best of all, Credential Engine have a great team, it’s a pleasure to work with them.