Tag Archives: resource description

On Semantics and the Joint Academic Coding System

Lorna and I recently contributed a study on possible reforms to JACS, a study which is part of a larger piece of work on Redesigning the HE data landscape. JACS, the Joint Academic Coding System, is mainatained by HESA (the Higher Education Statistics Agency) and UCAS (Universities and Colleges Admissions Service) as a means of classifying UK University courses by subject; it is also used by a number of other organisations for classification of other resources, for example teaching and learning resources. The study to which we were contributing our thoughts had already identified a problem with different people using JACS in different ways, which prompted the first part of this post. We were keen to promote technical changes to the way that JACS is managed that would make it easier for other people to use (and incidentally might help solve some of the problems in further developing JACS for use by HESA and UCAS), which are outline in the second part.

There’s nothing new here, I’m posting these thoughts here just so that they don’t get completely lost.

Subjects and disciplines in JACS

One of the issue identified with the use of JACS is that “although ostensibly believing themselves to be using a single system of classification, stakeholders are actually applying JACS for a variety of different purposes” including Universities who “often try to align JACS codes to their cost centres rather than adopting a strictly subject-based approach”. The cost centres in question are academic schools or departments, which are discipline based. This is problematic to the use of JACS to monitor which subjects are being learnt since the same subject may be taught in several departments. A good example of this is statistics, which is taught across many fields from Mathematics through to social sciences, but there are many other examples: languages taught in mediaeval studies and business translation courses, elements of computing taught in electronic engineering and computer science and so on. One approach would be to ignore the discipline dimension, to say the subject is the same regardless of the different disciplinary slants taken, that is to say statistics taught to mathematicians is the same as statistics taught to physicists is the same as statistics taught to social sciences. This may be true at a very superficial level, but obviously the relevance of theoretical versus practical elements will vary between those disciplines, as will the nature of the data to be analysed (typically a physicist will design an experiment to control each variable independently so as not to deal with multivariate data, this is not often possible in social sciences and so multivariate analysis is far more important). When it comes to teaching and learning resources something aimed at one discipline is likely to contain examples or use an approach not suited to others.

Perhaps more important is that academics identify with a discipline as being more than a collection of subjects being taught. It encapsulates a way of thinking, a framework for deciding on which problems are worth studying and a way of approaching these problems. A discipline is a community, and an academic who has grown up in a community will likely have acquired that community’s view of the subjects important to it. This should be taken into account when designing a coding scheme that is to be used by academics since any perception that the topic they teach is being placed under someone else’s discipline will be resisted as misrepresenting what is actually being taught, indeed as a threat to the integrity of the discipline.

More objectively, the case for different disciplinary slants on a problem space being important is demonstrated by the importance of multidisciplinary approaches to solving many problems. Both the reductionist approach of physics and the holistic approach of humanities and social sciences have their strengths, and it would be a shame if the distinction were lost.

The ideal coding scheme would be able to represent both the subject learnt and the discipline context in which it was learnt.

JACS and 5* data

Tim Berners-Lee suggested a 5 star deployment scheme for open data on the world wide web:
* make your stuff available on the Web (whatever format) under an open licence
** make it available as structured data (e.g., Excel instead of image scan of a table)
*** use non-proprietary formats (e.g., CSV instead of Excel)
**** use URIs to denote things, so that people can point at your stuff
**** link your data to other data to provide context

Currently JACS fails to meet the open licence requirement for 1-star data explicitly, but that seems to be a simple omission of a licensing statement that shows the intention that JACS should be freely available for others to use. It is important that this is fixed, but aside from this, JACS operates at about 3-star level. Assigning URIs to JACS subjects and providing useful information when someone accesses these URIs will allow JACS to be part of the web of linked open data. The benefits of linking data over the web include:

  • The identifiers are globally unique and unambiguous, they can be used in any system without fear of conflicting with other identifiers.
  • The subjects can be referenced globally by humans by from websites, emails, and by computer systems in/from data feeds and web applications.
  • The subjects can be used for semantic web approaches to representing ontologies, such as RDF.
  • These allow relationships such as subject hierarchies and relationships with other concepts (e.g. academic discipline) to be represented independently of the coding scheme used. An example of this is SKOS, see below.

In practical terms, implementing this would mean:

  • Devising a URI scheme. This could be as simple as adding the JACS codes to a suitable base URI. For example H713 could become http://id.jacs.ac.uk/H713
  • Setting up a web service to provide suitable information. Anyone connecting to that URL would be redirected to information that matched parameters in their request. A simple web browser would request an HTML page and so be redirected to http://id.jacs.ac.uk/H713.html; web applications would request data in a machine readable form such as xml, rdf or json.

The main overhead is in setting up, maintaining and managing the data provided by the web service, but Southampton University have already set one up for their own use. (The only problem with the Southampton service–and I believe Oxford have done something similar–is a lack of authority, i.e. it isn’t clear to other users whether the data from this service is authoritative, up to date, used under a correct license, sustainable.)

JACS and SKOS

SKOS (Simple Knowledge Organization System) is a semantic web application of RDF which provides a model for expressing the basic structure and content of concept schemes such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other similar types of controlled vocabulary. It allows for the description of a concept and the expression of the relationship betweens pairs of concepts. But first the concept must be identified as such, with a URI. For example:
jacs:H713 rdf:type skos:concept
In this example jacs: is shorthand for the JACS base URI, http://id.jacs.ac.uk/ as suggested above; rdf: and skos: are shorthand for the base URIs for RDF and SKOS. This triple says “The thing identified by http://id.jacs.ac.uk/H713 is a resource of type (as defined by RDF) concept (as defined by SKOS)”.

Other assertions can be made about the resource, e.g. the preferred label to be used for it and a scope note for it.
jacs:H713 skos:prefLabel “Production Processes”
jacs:H713 skos:scopeNote “The study of the principles of engineering as they apply to efficient application of production-line technology.”

Assuming the other JACS codes have been suitably identified, relationships between them can be described:
jacs:H713 skos:broader jacs:H710
jacs:H713 skos:related jacs:H830

Once JACS is on the semantic web relationships between the JACS subjects and things in other schemas can also be represented
http://example.org/123 dct:subject jacs:H713
(The resource identified by the URI http://example.org/123 is about the subject identified by jacs:H713).

Book now available. Into the Wild – Technology for Open Educational Resources

Into the Wild (Book cover)
Into the Wild (Book cover)

With great pleasure and more relief I can now announce the availability of Into the wild – technology for open educational resources, a book of our reflections on the technology involved in three years of the UK OER Programmes.

From the blurb:

Between 2009 and 2012 the Higher Education Funding Council funded a series of programmes to encourage higher education institutions in the UK to release existing educational content as Open Educational Resources. The HEFCE-funded UK OER Programme was run and managed by the JISC and the Higher Education Academy. The JISC CETIS “OER Technology Support Project” provided support for technical innovation across this programme. This book synthesises and reflects on the approaches taken and lessons learnt across the Programme and by the Support Project.

This book is not intended as a beginners guide or a technical manual, instead it is an expert synthesis of the key technical issues arising from a national publicly-funded programme. It is intended for people working with technology to support the creation, management, dissemination and tracking of open educational resources, and particularly those who design digital infrastructure and services at institutional and national level.

You may remember Lorna writing back in August that Amber Thomas, Martin Hawksey, Lorna and I had written 90% of this book together in a Book Sprint. Well, the last 10% and the publication turned in to a bit of a marathon-relay, something about which I might write some time, but now the book is available in a variety of formats:

  • If you want glossy-covered paperback, then you can order it print-on-demand from Lulu (£3.36); if you’re not so fussed about the glossy cover and binding then there is a print-quality pdf you can print yourself.
  • If you have an ePub reader you can download, there is a free download of an epub2 file.
  • If you have a Kindle, you can download the .mobi file and transfer it, or if you prefer the convenience of Amazon’s distribution over whisper-net you can buy it from them (77p, they don’t seem to distribute for free unless you agree to give them exclusive rights for all electronic formats).
  • finally, if you prefer your ebook reading as PDFs, there is one of those too.

All varieties are free or at minimum cost for the distribution channel used; the content is cc-by licensed and editable versions are available if you wish to remix and fix what we’ve done.

Available via the Cetis publications site.

Rounding up the JLeRN experiment

[cross posted from the JLeRN blog]

We have reached the end of the JLeRN experiment, at least the end of the current JISC funding for Mimas to set up a node on the Learning Registry and examine its potential. One part of the process of rounding up ideas and outputs generated through the experiment was a meeting of those who had engaged with it, held on 22nd October. This post providers pointers to two sets of resources associated with that meeting: the blog posts etc. that people who attended it wrote after the event, in order to summarise what had been discussed, but first a quick round-up (mostly from Sarah Currier) of posts that describe what the people who attended had been doing with the Learning Registry.

The Story So Far?by David Kay
A summary of some of the “headline ‘findings'” of a series of conversations that David has been having in an attempt to pin down the nature of the Learning Registry and its potential.

Understanding and using the Learning Registry: Pgogy Tools for searching and submittingby Pat Lockley.
Pat has been very involved in the Learning Registry from the start. This blog post gives you access to all four of his open source tools that work with the Learning Registry, set up to work with our JLeRN node. They are very easy to install (two of them plug very easily into Chrome) and try out, plus Pat has made some brief videos demonstrating how they work. The tools use the Learning Registry to enhance Google (and other) searching, and support easy submission of metadata and paradata to a node. I’ve added in a sample search you can use with the Chrome tools that should show you how it works pretty quickly.

Taster: A soon-to-be released ENGrich Learning Registry Case Study for JLeRNby the ENGrich Project.
The ENGrich project are working on a case study on why and how they have implemented a Learning Registry node to enhance access to high-quality visual resources (images, videos, Flash animations, etc.) as OERs for engineering education at university level. Their work has involved gathering paradata from academics and students; this taster gives you an overview. A really interesting use case. Please pass this one on to anyone working with engineering resources too!

Taster: some ideas for a use case on paradata and accessibility opportunitiesby Terry McAndrew
Terry is an accessibility expert from JISC TechDis, came to our first Hackday, and got us thinking about how the Learning Registry model for capturing and sharing paradata might be useful for people to share information about how accessible resources are. We commissioned him to write up a use case for this; look here to see his beginning thoughts, and add any of your own.

How widely useful is the Learning Registry?: A draft report on the broader contextby David Kay
The JLeRN team have been keeping half an eye from the start on the potential affordances the Learning Registry might offer the information landscape outwith educational technology: what about library circulation data, activity data and activity streams, linked data, the Semantic Web, research data management? And what if we are missing a trick; maybe there are already solutions in other communities? So we commissioned a Broader Context Report from David Kay at Sero Consulting. This is his first draft; we’re looking for feedback, questions and ideas.

Summaries/reflections from after the meeting

Registryingby Pat Lockley
Pat’s summary of his presentation on how the Learning Registry affects the interactions between developers, service managers and users.

Experimenting with the Learning Registryby Amber Thomas
A round-up of the meeting as a whole, pulling out a couple of the significant issues raised: the extent to which the Learning Registry is a network, and the way some real tricky problems have been pushed out of scope…but are still there. Some really useful comments on this post as well, notably from Steve Midgley on increasing adoption of the Learning Registry in the US.

JLeRN Experiment Final Meetingby Lorna M Campbell
Another summary of the meeting, summarising the uses made of the Learning Registry by projects represented at the meeting, mentioning some subject areas where the use of the Learning Registry to store information about curriculum mapping may prove useful and questions from Owen Stephens about alternative approaches to the Learning Registry.

At the end of the JLeRN experimentby Phil Barker
My summary of the meeting, covering the issues of whether the Learning Registry is a network or just isolated nodes (-not much of a network, yet), whether it works as software (-seems to) and why use it and not some alternative (-it’s too early to tell).

At the end of the JLeRN experiment

The JLeRN experiment was a toe dipped in the learning registry, a trial at different approach to sharing information about learning resources and how they are used that focusses on getting the information out there and not on worrying over the schemas and formats in which the information is conveyed. That experiment (JLeRN, not the Learning Registry as a whole) is drawing to a close, so we had a meeting earlier this week to review what had been done, what had been learnt and what was left to do and learn.

Sarah Currier had arranged for projects that had worked with JLeRN blog something about what they had done before the meeting, here’s the email with a summary of them, if you haven’t come across JLeRN before you might want to have a look through them before reading on. What I want to describe here is my own understanding of where the Learning Registry is and to report some of the issues about it raised at the meeting.

The Learning Registry: Nodes or a network?

The learning registry as a network from a presentation by Dan Rehak and others.. © Copyright 2011 US Advanced Distributed Learning Initiative: CC-BY-3.0.
The learning registry as a network from a presentation by Dan Rehak and others.. © Copyright 2011 US Advanced Distributed Learning Initiative: CC-BY-3.0.

From the outset the Learning Registry was conceived as a network, the software created would be nodes that connected together to share data about resources. Some of the details have been put on the back burner since those early descriptions, for example the ideas of communities and gateway nodes haven’t been much developed.

The community map on the Learning Registry website shows three nodes (the red pins), including the JLeRN node; Steve Midgely told us via email “There are a few development nodes out there that we know of: Agilix, Illinois Dept of Commerce and California Dept of Ed. To my knowledge there are no production nodes beyond the ones we currently run. Several companies have expressed interest in taking over our production nodes including Dell, Cisco and Amazon.” To that tally I can add the EngRich node at Liverpool. Steve adds that the only network he knows of is the LR public network. Now, I’m not sure about the other nodes, but I do know that the JLeRN and EngRich nodes haven’t interacted with the public network in any meaningful way (yet).

So I think we have to say that, to date, there isn’t really much to prove the concept of the Learning Registry as a network. There are, however some developments in the works that I think will change that, for example the Learning Registry Index, see below.

Services
The other aspect of the development of the Learning Registry against the vision shown in the diagram above is that of services being built to interact with the data in the nodes (these are shown as square in the diagram above). This is crucial since the Learning Registry is no more than plumbing to shift data around, it does nothing with that data that would interest a teacher or learner. It is left to others to develop services that meet user needs–Pat Lockley summed this up quite nicely in his presentation showing how the learning registry was targeted at developers and promoted relationships between developers, service managers and users more than was the case with traditional repository software.

“I think the major point of my slides was to suggest the learning registry is a “developer’s repository” – not that you need a developer to use it, more that you develop services around a node. Also, I feel there is a greater role for the developer in the ecosystems around a node than around a repository – the services on offer, and the scope of services you create seem richer – partially as any data can be stored.”

Well, there are some services for getting data in, there is the OAI-PMH to Learning Registry Publish Utility, and there is Pat’s RSS importer, Ramanathan, and his Google analytics data importer, Pliny. Also at least two projects–Scott Wilson’s SPAWS and Liverpool University’s EngRich–had involved the submission of data to Learning Registry nodes as part of the services they created.

But putting data in is meeting a service manager’s needs, it’s no good in itself since it doesn’t meet any user needs. There are a few user oriented services built off data in the Learning Registry. Pat showed us a couple of Chrome plugins, demos here and here. These are great as proofs of concept, and really important as such, they help show non-technical people what the learning registry is for. But there then follows some expectation management while you explain the limitations of the demonstrators. Other projects had embedded means of getting data out of the Learning Registry nodes into their project outputs, for example EngRich have an iLike widget for the Liverpool student portal that shows what resources students on specific courses have recommended based on data in their Learning Registry node.

Steve Midgely provided us with some very promising information, “the Gates foundation is funding several groups to build index and search services on top of Learning Registry (called Learning Registry Index) and that will require running nodes of some kind.”

Does it work?
One message that I picked up during the meeting and elsewhere is that the Learning Registry, as software, works. The people who set up nodes seem to have done so quickly, the people who used the APIs didn’t report problems in doing so. That’s a good place to be starting.

At a deeper level I guess we need to wait until there are more services built off the data in the Learning Registry to find out whether the Learning Registry works as a concept. Some known problems have been deliberately pushed out of scope in the development of the Learning Registry, one key one is not worrying about what formats and schemas for the data that goes in. This is good if you are submitting data, but unless some level of agreement is reached it does place the onus for making sense of the data on the people who are creating services that use the data. So far, the extent to which this (reaching agreement or making sense of arbitrary data) is possible in the context of the Learning Registry is untested.

Other questions remain over how the learning registry will function as a network, for example how duplicate and complementary records about the same resource will be dealt with when many people might be providing information about the same resource.

Why use it?
Owen Stephens and David Kay were at the meeting asking some very pertinent questions. Neither are particularly caught up in the education technology world, with more of a background in information systems for libraries, where of course there are different approaches to solving similar problems. So, why use the Learning Registry rather than raw couchDB, or some other schemaless, NoSQL, document store (e.g. MongoDB, which is popular for research data management), or free text indexing and search software such as Lucene/Solr, or RDF triple stores, or just a traditional relational database with SQL? To some extent the aim at the moment is to try and answer some of those questions: we won’t know if we don’t try it. But it’s valid to ask how far have we got to answering them, and here is my appraisal.

RDF?
Schemaless sharing of data still appeals to me because I don’t think we know what schema we want to use to share some of the interesting information about the use of resources for teaching and learning. I think the RDF approach will influence the data that is submitted, for example there is interest in using the Learning Registry to store LRMI style metadata. LRMI is adding properties to schema.org so that educational characteristics of resources can be described, and schema.org is only a step or two away from semantic web approaches such as RDF. But some influences of RDF we don’t want. For example there is a tendency at times for RDF approaches to fixate on ontologies. That would stall us. So, for example in LRMI it is possible to say that a resource “aligns” with some point in an educational framework: i.e. it is useful for teaching some topic in a standard curriculum, or assessing some skill required by a competency framework. That’s really useful, but the vocabulary for the nature of the alignment has had to be left open (“teaches” and “assess” are two suggested terms, others are that the resource has a certain “text complexity” or requires a “reading level” or other “educational level”)–the understanding of what education is about varies so much over the world and between settings that agreement on a closed ontology seems unattainable. Still, you could use RDF if you didn’t specify and ontology, and if you could make sense of the RDF without one.

Another weakness of RDF in this context, as I understand it, is its ability to deal with subjective opinions. As soon as a teacher or learner sees an assertion that resource X is good for teaching topic Y (to continue the example used above) they should be asking “says who”. Engineering students at Liverpool are more interested in what other Engineering students find useful, especially those at Liverpool, than they are in the opinions of physics students. Yes, you can have named graphs in RDF and provide information about who asserted which triples, but it goes beyond what is usual, whereas in it is built in from the start in the Learning Registry concept of paradata.

All of that is somewhat conjectural though, because as yet there is little in the Learning Registry that is not metadata that could be expressed in some standard schema such as LOM XML or DC RDF.

Other schemaless data stores
Why not use just CouchDB, without the Learning Registry API, or MongoDB, or Lucene? All of these would make sense for single instance data stores, which is pretty much what we have now with single more-or-less isolated nodes rather than a network. And, yes, I am sure that some way of sharing data between them could be worked up if that is what you wanted. So again any advantages of the Learning Registry is still putative at this stage.

One advantage of the Learning Registry is that, as I mentioned above, it does seem to work: it does seem to come out of the package as a functional way of storing and sharing data that is tailored to education. So as an introduction to No SQL databases it’s not a bad place for the education community to start.

In summary
In a post about the end of the JLeRN project David Kay has quoted Simon Schama on his not being sure whether the French Revolution was over. I’ll quote what Chairmain Mao supposedly said when asked what he thought of the French Revolution; “it’s too early to tell”. The things to look out for are a functioning network of nodes and user-facing services being delivered from data in those nodes. Then we can ask whether that data could be shared in any other way. For the time being I think that the main achievement of JLeRN and the UK’s involvement in the Learning Registry is that it has started people thinking about alternatives to relational databases and they have taken first steps into working with these. Too often, I think, data has been squeezed into an relational data where the benefits of doing so are simply that it is what the developer happens to be familiar with. If all you have is a hammer then you can have real problems dealing with screws.

[updated to correct an attribution error as to who was comparing JLeRN to the French revolution]

A lesson in tagging for UKOER

We’ve been encouraging projects in the HE Academy / JISC OER programmes to use platforms that help get the resources out onto the open web and into the places where people look, rather than expecting people to come to them. YouTube is one such place. However, we also wanted to be able to find all the outputs from the various projects wherever they had been put, without relying on a central registry, so one of the technical recommendations for the programme was that resources are tagged UKOER.

So, I went to YouTube and searched for UKOER, and this was the top hit. Well, it’s a lesson in tagging I suppose. I don’t think it invalidates the approach, we never expected 100% fidelity and this seems to be a one-off among the first 100 or so of the 500+ results. And it’s great to see results from Chemistry.FM and CoreMaterials topping 10,000 views.

Text and Data Mining workshop, London 21 Oct 2011

There were two themes running through this workshop organised by the Strategic Content Alliance: technical potential and legal barriers. An important piece of background is the Hargreaves report.

The potential of text and data mining is probably well understood in technical circles, and were well articulated by JohnMcNaught of NaCTeM. Briefly the potential lies in the extraction of new knowledge from old through the ability to surface implicit knowledge and show semantic relationships. This is something that could not be done by humans, not even crowds, because of volume of information involved. Full text access is crucial, John cited a finding that only 7% of the subject information extracted from research papers was mentioned in the abstract. There was a strong emphasis, from for example Jeff Lynn of the Coalition for a digital economy and Philip Ditchfield of GSK, on the need for business and enterprise to be able to realise this potential if it were to remain competetive.

While these speakers touched on the legal barriers it was Naomi Korn who gave them a full airing. They start in the process of publishing (or before) when publishers acquire copyright, or a licence to publish with enough restriction to be equivalent. The problem is that the first step of text mining is to make a copy of the work in a suitable format. Even for works licensed under the most liberal open access licence academic authors are likely to use, CC-by, this requires attribution. Naomi spoke of attribution stacking, a problem John had mentioned when a result is found by mining 1000s of papers: do you have to attribute all of them? This sort of problem occurs at every step of the text mining process. In UK law there are no copyright exceptions that can apply: it is not covered by fair dealling (though it is covered by fair use in the US and similar exceptions in Norwegian and Japanese law, nowhere else); the exceptions for transient copies (such as in a computers memory when readng on line) only apply if that copy has not intrinsic value.

The Hargreaves report seeks to redress this situation. Copyright and other IP law is meant to promote innovation not stifle it, and copyright is meant to cover creative expressions, not the sort of raw factual information that data mining processes. Ben White of the British Library suggested an extension of fair dealling to permit data mining of legally obtained publications. The important thing is that, as parliament acts on the Hargreaves review people who understand text mining and care about legal issues make sure that any legislation is sufficient to allow innovation, otherwise innovators will have to move to those jurisdictions like the US, Japan and Norway where the legal barriers are lower (I’ll call them ‘text havens’).

Thanks to JISC and the SCA on organising this event; there’s obviously plenty more for them to do.