A short update on resource tracking

In our reflections on technical aspects of phase 2 of the UKOER programme, we said that we didn’t understand why projects aren’t worrying more about tracking the use and reuse of the OERs they released. The reason for this was that if you don’t know how much your resources are used you will not be in a good position sustain your project after JISC have stopped funding it. For example, how can you justify the effort and cost of clearing resources for release under a Creative Commons licence unless you can show that people want their own copies of the resources you release rather than just view the the copy you have on your own server? Here is a quick update projects related to resource tracking.

TrackOER
Under the OER Rapid Innovation programme JISC have funded the TracOER project. It was known from the outset that the project would start slowly but in the last couple of weeks it has got some momentum going. The nub of the problem they are looking at is that

when an OER is taken from its host or origin server, in order to be used and reused the origin institution and the community generally lose track of it.

Building on work by Scott Leslie, their prospective solution is the use of a web bug/beacon: an image, normally invisible but TrackOER may use the creative commons licence badge, embedded in the resource but hosted by whomever is collecting the stats (let’s say the OER publisher). So long as the image is not removed, whenever the resource is loaded there will be a request sent to the publisher’s server for that image and that request can be logged. Additional information can be acquired by appending ?key1=value1&key2=value2… onto the src url of the img element in the resource; anything after the ? is logged in the server logs but does not affect the image that is served. For example, you could encode an identifier for the OER like this
<img src="http://example.com/tracker.png?oerID=1234">

TrackOER are investigating the use of Google analytics and the open source alternative piwiki (both with and without JavaScript, maybe) for the actual tracking. One of their challenges is that both normally assume that the person doing the tracking knows where the resource is, i.e. it will be where they put it, whereas with OERs one of the things that would be most worth knowing about is whether anyone has made a copy your resource somewhere else. However if you use JavaScript you have access to this information and can write it to the tracking image URL. Another challenge comes with using Creative Commons licence images instead of an invisible tracking bug is that you use several images for tracking not just the one. TrackOER have modified piwiki to allow for the use of multiple alternative images.

As an aside, TrackOER have also found a service called Stipple, they say:

using Stipple to track OER across the web in the same way as the TrackOER script is perfectly feasible. It might even be easy. You could get richer analytics as well as access to promotional tools.

OER tracking at Creative Commons
Creative Commons have posted three ideas for tracking OERs, two which use a mechanism they call refback and one which provides an API to data they acquire as a result of people linking to their licences and using images of licence badges served from their hosts. In all cases it is a priority to avoid anything that smacks of DRM or excessive and covert surveillance, understandable given that Creative Commons as an organistion is a third party between resource user and owner and cannot do anything that would risk losing the trust of either.

Refback tracking involves putting a link in the resource being tracked to the site doing the tracking (the two variants are that this may be either the publisher or Creative Commons, i.e. independent and distributed or hosted and centralised). If a curious user follows that link (and the assumption is that occasionally someone will) the tracking site will log request for the page to which the link goes, included in the log information is the “referrer” i.e. the URL of the page on which the user clicked the link. An application on the tracking site will work through this referrer log and fetch the pages for any URL it does not recognise to ascertain (e.g. from the attribution metadata) whether they are copies of a resource that it is tracking.

The third approach involves Creative Commons logging the referrers for requests to get a copy of one of their licence badges, and then looking at the attribution metadata on the web page in which the badge was embedded to build up a graph of pages that represent re-use of others. This information would be hosted on Creative Commons servers and be available to others via an API.

Sharing and learning

Phil Barker's work

Related

One thought on “A short update on resource tracking”