Identifiers for UK OER “works”

Would it be useful and feasible to have a single identifier to link together all the instances of a learning resource? To be more specific, consider a lecture that has been videoed. The video is available on YouTube, in a national repository, and from the website of the institution where it was delivered (also bear in mind that there might be an audio-only recording and written transcripts of the same lecture). Should there be an HTTP URI that relates (indirectly) to all of these versions of the same lecture?

The reason I’m asking is because we are trying to define what should be the technical and metadata requirements for materials produced through the HEFCE/JISC/HE Academy OER funding. I’m suggesting a short URI that can be used to refer to any version of a work; as I envisage this it would resolve to a list of all the known versions. I think this is important for accessibility (in its widest sense), sustainability, and for collating information about a resource that may be available all over the place. The problems centre on the difficulty of understanding the concepts involved, and are especially acute because the implementation cannot be done just by the people who currently understand the issues involved.

You can see something like this in Amazon.co.uk, when the page for on edition of a book contains information about other editions:
Information about other editions of a book as displayed on Amazon.co.uk
Why should we do this? Clearly each version of the lecture mentioned above would have its own URL by which it could be referenced, but where there is an advantage in adopting a variety of delivery strategies there is also an advantage in exposing this variety. So it may be that someone works where access to YouTube is blocked, if they learn about the lecture but only have the YouTube URL they don’t get access to it, and they may never know that other versions exist. Similarly exposing a link to a transcript of the audio part of the lecture could be important for accessibility.

Conversely, when referring to an educational resource, for example when writing a review of it, often it is the educational content that is important not the medium. Let’s assume that the YouTube, repository and institutional website versions all have the same educational benefits: why should a review prioritise one version over the others by citing the URL for that one? Or do we want always to have to cite all three (or four, or five, if the audio only and transcripts are included)? Especially when there is a chance that any one of the URLs will turn out not to be as persistent as the reviewer would hope. This is analogous to citing the ISBN for a hardcover edition of a book only to find it goes out of print and is replace by a softcover edition from another publisher. You sometimes see this on Amazon as well (“this review is from [some other edition]”)

If reviews and usage information in general do cite a single identifier regardless of the version they use then it becomes easier to aggregate this information and apply it (where appropriate) to all the versions. Remember that for an OER it will be a challenge just to keep track of where all the different versions of a resource are located.

So is it feasible? What are the problems? Well firstly this sort of thing is difficult for most people to understand. It’s not obvious why you would want to do this, or what the “thing” that is being identified is or how the different versions of a resource relate to it. Analysing the problem tends to confirm that this is complex rather than simplify it; that’s why I haven’t mentioned FRBR works, for example as in this post until now. To try to keep it simple I’m only suggesting an identifier for the work, not the full FRBR chain.

The next problem would be how to expose the work-level identifier to users of any specific item. It could be embedded in the resource itself as text “when citing this resource please use the URI http://… which relates to all available versions, unless you wish to refer to a specific feature of this version”. Or something, like I said it’s a difficult concept to explain succinctly. (If that’s an active hypertext link it would help auditing by allowing a Google search to show public instances of the resource.) Or it could be used as a tag on the host site, that would be useful, but not as good.

There are workflow issues here: the identifier has to be applied to the resource by the resource creator; the identifier needs to be assigned before the resource is published. Pretty much the same as with an ISBN, but the problem is that the people using ISBNs have a pretty good idea of what they are for (mostly). The identifier also needs to be short if it is to be usable by humans. That’s not in itself a problem, something like http://oer.ac.uk/AAAA where the Cs can be any character [a-z][A-Z][0-9], which would give over 14 million IDs, would be enough. To avoid clashes though they probably need to be generated and assigned centrally, which adds to the workflow problem.

Finally of course, something needs to be listening at http://oer.ac.uk/ and serving the lists of known versions. Technically that’s not difficult, but it relies on someone telling the service where these versions are. Maybe it’s not unreasonable to ask people who get money this was to tell the funders what they produced with it, but I have a fear of registries, it always seems hard to populate them unless there is some over-riding benefit. I guess this could be a facet of the JorumOpen service.

So, I started with a question: “Would it be useful and feasible to have a single identifier to link together all the instances of a learning resource?” Answers below, please.

9 thoughts on “Identifiers for UK OER “works”

  1. Useful? Yes, very.
    Feasible? Technically yes, but that does not ensure usage. Look at how slow the DOI has been to penetrate users of “traditional” media – if you asked most academics, they wouldn’t have a clue what it is. So I’m in favour of this idea, but there would be a long road ahead, even after the inevitable wrangles over the architecture had been resolved.

  2. Hi Phil

    I think there are also other aspects to consider, such as whether links should be provided to equivalent alternative resources, such as an alternative to a drag and drop exercise (which is not accessible to people with visual impairments) which may be presented in an alternative format, but which also has an equivalent learning objective. The IMS AccessForAll Specification (AccMD) describes the metadata required for various types of resources: primary resource (e.g. the initial resource, perhaps a YouTube video of a lecture); supplementary resource (e.g. captions for the YouTube video); non-supplementary (e.g. transcript of the YouTube video complete with text descriptions of any actions carried out or screenshots shown).

    The IMS AccMD Specification suggests that URLs are used to describe locations of alternative resources but doesn’t say how this should be done.

    TechDis were looking at an Accessibility Passport scheme whereby each resource would carry with it a document stating it’s accessibility (or inaccessibility) and suggestions for alternative resources. It was suggested that a wiki could be kept and updated by people responsible to creating (or using) learning resources stating accessibility issues, workarounds and URLs to other equivalent or alternative resources.

    Whilst I seem to be concentrating on the accessibility side, it’s only because a lot of discussion and thought has gone into this issue of keeping identifiers in one place for accessibility purposes. However, I’m sure it would be just as valuable for everyone, although the logistics and practicalities of doing this may be difficult to overcome.

  3. Hi Phil,

    Very interesting post and it’s a resounding YES from me.

    As a former CETIS Accessibility SIG Assistant, and current User Support Assistant Librarian, I’m really keen on people being able to access resources in the format most suitable for them. I have actually spent some of the last week looking into the issue of providing access to inaccessible printed and online resources for a visually impaired student, and having a simple way of identifying several alternative formats of a given resource would make the job of both librarians and learners much much easier.

    I think it’s absolutely crucial that people with visual impairments, hearing impairments, mobility issues, dyslexia, dyspraxia, asperger spectrum conditions and a range of other specific needs are able to have a) access to resources at all, and b) have access in resources in the way that is most suitable for them. I’m increasingly short sighted myself, and finding certain resources such as my television less accessible (without my glasses), so I’m hugely in favour of increasing the number of ways to access resources, and an easy way of finding these different options.

    I don’t think this is a just an accessbility issue either, as champions of acessibility always point out, making resources available in different formats is also a winner in terms of general usability. A learner may enjoy listening to a podcast or watching a video on a computer with soundcards and headphone, but want the text version on a computer without headphones, or actually find that a version with lots of visual content is easier to understand.

    There is also a learning benefit is making different formats of a resource available, in that people may even find it enhances their learning and retention to actually explore and access the resource in more than one format, as you tend to understand and remember different things depending on whether you are watching, listening, reading or engaging with a resource.

    Vashti (User Support Assistant, Bangor University Library)

  4. I think maybe I’m saying a variation of what Sharon said, but what is the real requirement here?

    Suppose I have a simple document-based learning resource, which exists in three revisions (I’m avoiding saying “version”, because it tends to be used loosely for several different relationship types) in which the content differs, with English and Welsh translations of each revision, and each available in HTML and in PDF formats, with copies made available in two different collections – “from two different repositories”, if we must 🙂

    Is it sufficient to know that document X (collection A’s copy of the PDF of the Welsh translation of revision 1) and document Y (collection B’s copy of the HTML of the English translation of revision 3) are related indirectly to a single Work? Or do I need to know more precisely whether the content of document X and document Y is based on the same revision or on two different revisions? Or if I have X and I want the English translation of that same content (or, as per Sharon’s case, an “equivalent alternative” based on some other set of attributes related to my personal preferences), how do I obtain it?

    I know, I know, I need to take off my FRBR-shaped spectacles once in a while! 🙂 But I hope you get the idea. What problem(s) are we trying to solve?

    (Also, whatever model we adopt, we need to be clear about what the model actually is, and particularly whether it is really FRBR, or whether it is something not-FRBR, but a bit like FRBR maybe with fewer entity types. And if it’s the latter – which I don’t rule out, not at all! – that model has to be defined clearly, and we have to be very careful not to confuse it with FRBR)

  5. Pete’s point about what problem we’re trying to solve is probably decisive. As far as I can see (and I’m getting as short-sighted as Vashti 😉 that means two problems:

    1. How can the funder easily keep track of all the content that was funded

    2. How can people easily find and cite all the different versions of a work (for accessibility reasons)

    Since the amount of control of the funder over the content creation workflow is fairly minimal, perhaps the point at which a project tells JISC of the existence of a new piece of content is where the identity issue can be addressed. For example: project x has created course Y. It exists as a course in their own Moodle VLE, and as a back-up archive on a Moodle community site. Project x then wants to deposit a ‘canonical’ version in IMS Content Package format with JISC. As part of the deposit process, a work level URI is created (http://oer.ac.uk/XY ), and URLs of derivatives solicited (“you indicate that there are no other versions of this resource. I don’t believe you [back]”). The work URI could point to a page that either embeds or links to the canonical version, and also contains typed links to the other known derivations.

    It’s not ideal, since there’s no way to enforce link-backs from derivatives to the work URI, but I guess we could ameliorate that with some support and advice.

  6. Thank you for all your replies, both here and by email.

    I think Pete is right about the problems with the word “version”. I’m not sure that “revision” is any better, since to me that implies making a change to the content. How about “rendering” as in “a translation, interpretation, reproduction or representation”?

    A couple of people have emailed me about the relationship between this idea and OAI-ORE. I guess there is a relationship in as much as one scenario for ORE is that of showing the different formats in which a journal paper is available. This information is often provided by the “start page” for documents in repositories (see the motivating example in the OAI-ORE Primer). You could see the approach that I’m suggesting as enabling the presentation of similar start pages but for resources copies of which are held all over the web. Importantly it provides an authoritative canonical URI for this “aggregation” and a registry for such URIs, which wouldn’t otherwise be available in a distributed environment.

    I agree with Wilbert about the problems being solved. There might also be some relevance to enhancing persistence of access, since if any one copy is lost the identifier will still resolve to information about other renderings. Of course that depends on a commitment to maintain the service at http://oer.ac.uk, but at least that commitment is the responsibility of the funders of the HEFCE OER initiative not some third party. (I suppose that even if http://oer.ac.uk is lost then the identifier would still be there as a unique tag exposed to the web so you could use a search engine to find renderings.)

    On the modelling aspects that Pete raises, I think that the identifier that I am proposing is for the FRBR Work, and it would resolve to a list of the locations of FRBR Items, optionally with a description of what is distinctive about each item (e.g. “Welsh translation, pdf”). I would limit the items to direct “vertical” derivatives of the work, not those that come from work-work relationships etc. I hope that would be sufficient to be useful whereas I think that using the entire FRBR model would not be feasible. (It might be worth trying something a little more elegant: allowing some work-work relationships might help mitigate against the tendency to list problem sets used in a lecture alongside videos of the lecture as renderings of the same work; I don’t know quite what would be required to help with the accessibility point Sharon and Vashti raise.)

    I half agree with Wilbert about the workflow. I think what he outlines is likely the first time around, but it is not optimal since, as he says, projects would be unlikely to put in the link-backs. So perhaps when what Wilbert describes the project should be offered some URIs to use with any other resources they happen to be developing that they might want to tell JISC about in the future, and given guidance on how to use them. Certainly Alan is right that achieving usage would be difficult. Andy Powell summed it up nicely writing about identifiers for people “the real (and very significant) hurdle to be overcome here is convincing people to think about solving a problem they don’t even know they have using a solution that they probably don’t find very intuitive!”

  7. There are a huge range of issues and problems here and one key is to simplify and solve the issues that are most easily expressible and solvable. So I’ll just throw one attempted clarification into this pot.

    Pete asks in his Jan 22nd post (as I understand it) about adaptations (or revisions) of adaptations so that if we have a resource X and we produce an adaptation or revision of it say Y and then produce an adaptation or revision of Y say Z then how can we structure the relation of all of these. This is a problem we spent several years debating in the Access For All (IMS AccMD, then ISO 24751 etc.) work. There seems to be a problem in trying to adopt a symmetrical solution for many reasons – the chief one being that doing so clashes with the notion of Metadata being contextually dependent (which in practise it is). Having written the words “adaptation” and “revision” above I was conscious while writing them that they are not the same thing at all – and that is the source of much confusion and difficulty. Let me explain ..

    If we wish to adopt a symetrical solution where everything is a first class object but there is only ONE record for each object then we immediately lose the possibility of a locally contextual view that is different from that. On the other hand, if we accept locally contextual views (as I believe we must) then in adopting such a view we impose a non-symmetry – we accept that some objects are in a relationship to others that is not equivalent.

    We solved this in our accessibility work by adopting several principles which I paraphrase as follows:

    1. We have first class objects (original resources) and adaptations of them. These adaptations may be equivalent or derivative but the relationship as we describe it in this record is assymetrical.

    2. We do not describe adaptations of adaptations – that doesn’t mean they cannot occur but not in the same description. Therefore any relating structure is only one level deep – no recursion or circularity or non-definition.

    3. Multiple descriptions are allowed – something that is an adaptation for one original resource could itself be an original resource in another description.

    4. The record that relates the resource to adaptations (or revisions of it) has two parts – it describes the resource and it has identifiers of adaptations for it – but that list of adaptations should be regarded as cacheing memory – it will often need to be updated, by a search and often the identifiers will be broken (particularly so when they identify objects outside of this repository)

    As I see it – the central point of difficulty is attempting to impose a symmetrical structure on what isn’t a symmetrical use case – Metadata is of use to communities and each will define its own and in accepting that we must accept assymetry and collections of resources associated assymetrically and possibly differently for different communities. This is a very serious problem because it seems to say something about how organisations handle their boundaries with the outside world.

    In the repository revisions may be equivalent but to the outside world I think they are not. If X -> Y -> Z and the relationship is one of derivation or educational equivalence then we would support only Metadata saying X -> Y and X -> Z in a description that a community has and Y -> Z would be outlawed in that description (Z would be related back to the real original). Of course we could have a completely separate desciption (for a separate context or community or even micro-example) where Y -> Z and the same objects are involved. This way we get both sense and context but we must accept assymetry (there are original objects and in a sense sub-objects in any one description).

    I hope this provides some useful info – its very difficult to express without writing a very long essay on it but I am very happy to discuss it further with anyone – but please could we move the discussion onto a list (or echo it to one) ?

    andy

  8. There’s an interesting discussion of “Resource Oriented Architecture” in the book RESTful Web Services, which discussed the relationship between URIs and resources, how to point to the latest version or older revisions, how to provide metadata in the response to link to related resources (links and connectedness) and so on.
    The uniform HTTP interface gives the requester information about resource availability, whether it has been moved or replaced and so on.
    It is straightforward enough to provide a webservice that creates (and returns) a unique URL when a resource is uploaded (we do this for images, although we use incrementing integers rather than GUIDs so they are not *guaranteed* unique).
    I think it is important to consider that resources may be automatically generated on the fly, including certain more accessible versions.
    The way our prototype image service work is that there is an archetypal URI:

    http://www.example.com/assets/images/image/1007

    which represents an image. If conversion is available, you should be able to add a path segment (or file extension) thus:

    http://www.example.com/assets/images/image/1007.gif

    to get a GIF-formatted version, and add a command like:

    http://www.example.com/assets/images/image/1007/thumbnail/120;120

    to get a resized thumbnail to the supplied dimensions (we have got this part working) that doesn’t have to have existed before.

    There is no magic here, and you have to supply the ICT infrastructure and plan, design and implement the operations of your service, the URI architecture that accesses it and so forth, and what parameters (and data types) to accept.

    You can imagine some kind of URI architecture like:

    http://www.example.com/assets/documents/document/1007.pdf/2009-01-28?bodytext=18pt

    which represents the 28 January 2009 revision of a document (internal ID of 1007) to be retrieved in PDF format with a body text size of 18 points.

    At the end of the day, you could come up with a stable URI convention that could be reused in any web domain, which is independent of underlying technology. However, the maintenance of these services becomes the next challenge.

Comments are closed.