I was (virtually) at the PESC 2023 Data Summit recently, presenting on a panel about Re-Engineering the “All-Or-Nothing” Academic Transcript to Reveal Its Unequivocal Value. This post is based on that presentation. It sets out the journey of a working group that I have been on, taking an XML record-based approach to detailing a student’s progress through education and turning it into JSON-LD. My input has been to focus on the Linked Data / RDF standards aspects of that. I should note that initially the project was about a transition to JSON, the emphasis on Linked Data and the model we’re adopting dropped out of our discussions about what we wanted to achieve and what various models enabled. We didn’t start with this solution in mind and try to jemmy it in. I think that’s important.
What we started with
We started with the PESC transcript in XML, and translated this to JSON. XML gives you records with nested elements with meaning that depends on their parent & sibling elements. The image below is incomplete and very simplified and I have taken some liberties to make it presentable, but it gets the idea across.
Presented like this you can see how the nested structure is kind of reassuringly similar to a printed transcript. Take note of how in the bottom right the information under “course” mixes information that is true of the course that everyone took (Course Number, Name) and information that is specific to the person whose details are provided nested in the same Student element. In reality the Course Number and Name have nothing to do with the Student.
JSON can be similar, but for Linked Data–as in JSON-LD–you no longer have a “record” paradigm, you have sets of statements about different things and the statements describe the characteristics of those things and the relationships between them.
The first cut
We took a first trip through the data model, focusing what were the different things being described and how did they relate to each other. The image below illustrates (again a very simple) example of what we came up with.
This usefully shows that there are some things that “belong” to the transcript, some things that are about a Person and what they have done, and some things that are about an Organization (e.g. a College) and what they offer (Courses).
But, when you look at real world examples, it was actually a bit of a mess: the relationships were more about where you find things on a printed transcript than what makes sense as relationships between the things being described. Look how you go from AcadmicSummary to Academic Session to Academic Achievement to Course.
Where we are now
We started thinking of a transcript as a record of a person’s various types of achievements in programs/courses/sessions/etc offered by an organization. That looks like this.
It looks a little less simplified, but it’s showing two achievements.
See how there is a split between the personal private information (yellow and blue boxes on the middle left) and the corporate public data (pink boxes on right and at the top) typically found in a Course Catalogue or similar, the type of information that could be made available as linked data through the Credential Engine Registry (other registries could exist, but Credential Engine pay me)
What if there were no need to generate the corporate data just for the transcript because it could be reused (either by repeating it in the transcript data or just linking to it.)
One final thought on the data structure. The heart of this view of the transcript is a series of assertions that an organization issues saying that a person has achieved something. These are represented by the Person-Achievement-College boxes running diagonally from botton left to the to right. This is the same core as a W3C Verifiable Credential, and the transcript is the same structure as a Verifiable Presentation. What if the data structure of the transcript were the same as that of a Verifiable Presentation? That is the approach taken by other similar standards, such as the European Learner Model and 1EdTech’s Comprehensive Learner Record. Having a common data model (even without going the whole way into including signed VCs in the transcript) will surely be helpful. If it is compatible with having a transcript made up of VCs, then so much the better, but we shall continue to follow the use cases to find a solution, not the other way round.