Tag Archives: resource discovery

Finding OERs

Background: Chris McMahon is the Delores project director. He has a great deal of experience in the management and presentation of information for design engineering and in selecting and using online learning materials, but this project is his introduction to the world of OER. His initial exploration OER-specific resource discovery has left him questioning whether aggregating and searching metadata provided OER producers is the right approach as opposed to customising a generic search to be specific to known OER sites. Chris writes:

–quote–

My initial reaction from attempting to find material in the OER repositories and collections is that the descriptions of the available material are not particularly helpful in searching and finding resources. For example, I tried to find material on “gear design” in OER Commons. The 30 resources returned for my search were as follows:

Eight audio files from UC Berkeley. All were potentially relevant but little real indication of content was given in the descriptions. I would have to listen to each file to find if it is relevant. Only the title of the audio file indicates that it might be useful (each file has the same abstract, which describes the whole course–not the particular audio file).

The next eight resources were not relevant but included because the word gear appears out of the context of gear design (e.g. landing gear, protective gear) somewhere in the descriptions.

The next resource, MIT Open Courseware “Elements of Mechanical Design”, is very relevant but the reference expands to 17 sets of lecture notes, of which only 2 are relevant. The Abstract is only a very high level description of the whole course and gives no indication of the breadth and relevance of the underlying materials.

The next four resources are not relevant.

The next resource, MIT Open Courseware “Marine Power and Propulsion”, expands to 45 separate lecture documents, of which 2/3 are relevant. Again the abstract is only very high level description and gives no indication of the breadth and relevance of the underlying materials.

The next resource is repeat of the MIT OCW “Elements of Mechanical Design” but from an earlier year.

The next seven resources are not relevant but the descriptions contain words for which gear and design are stems.

In summary – the descriptions are whole course descriptions, not descriptions of the lecture/topic material within the courses. The descriptions (and presumably the RSS feeds) use the same format for single audio files and complete courses.

By contrast, using “gear design site:ocw.mit.edu” as a search in Google gave very relevant material in the first page of the (327) results. Using the “type=PDF” qualifier was even better as it pulled up the lecture notes. Using the MIT OCW search facility was pretty good also.

What would be really useful would be to have a good search facility that allowed search within known OER repositories – a sort of “Google OER”.

–unquote–

Since talking this through with Chris I have resolved to make a better effort at publicising work that my colleague Lisa Scott has done on Google Custom Search Engines. However there are other implications for the project: in the static collection, how do we select and provide descriptions at the fine level of granularity that Chris wants while also keeping the valuable information of the original course context of the resource; will the quality of the syndicated metadata be good enough for the Bayesian filtering to work; can we supplement this by using information from the course/resource webpage; what use can we make of customised Google searches? (We know the the Triton project are also interested in this last point.)

Google custom search for UKOER

It has become very clear to me over the last week or so that I haven’t done enough to publicise some work done over the summer by my colleague Lisa Scott (Lisa Rogers, as she then was) on showing how you can create a Google Custom Search Engine to search for OER materials. In summary, it’s very easy, very useful, but not quite the answer to all your UKOER discovery problems.

A Google Custom Search Engine (Google CSE) allows one to use Google to search only certain selected pages. The pages to be searched can be specified individually or as URL patterns identifying a part of a site or an entire site. Furthermore the search query can be modified by adding terms to those entered by the user.

The custom search engine can be accessed through a search box that can be hosted on Google or embedded in a web page, blog etc. Likewise, the search results page can be presented on Google or embedded in another site. Embedding of both search box and results page utilises javascript hosted on the Google site.

The pages to be searched can be specified either directly by entering the URL patterns via the Google CSE interface, listed in an XML or TSV (tab separated variable) file which is uploaded to the Google CSE site, or as a feed from any external site. This latter option offers powerful possibilities for dynamic or collective creation of Custom Search Engines, especially since Google provide a javascript snippet which will use the links on a page as the list of URLs to search. So, for example a group of people could have access to a wiki on which they list the sites they wish to search and thus build a CSE for their shared interest, whatever that may be.

A refinement that is sometimes useful is to label pages or sites that are searched. Labels might refer to sub-topics of the theme of the custom search engine or to some other facet such as resource type. So a custom search engine for engineering OERs might label pages as different branches of engineering {mechnical, electronic, civil, chemical, …} or type of resource to be found {presentation, image, movie, simulation, articles, …}. In practice, whatever the categorisation chosen for labels, there will often be pages or sites that mix resource from different categories, so use of this feature requires thought as to how to handle this.

A Google CSE for UKOER
Our example of a simple Google CSE can be found hosted on Google.

This works as a Google search limited to pages at the domains/directories listed below; where the URL pattern doesn’t lead only to content that is UKOER the term ‘+UKOER’ added to the search terms entered by the user. The ‘+’ in the added term means that only those pages which contain the term UKOER are returned. This is possible since the programme mandated that all resources should be associated with the tag UKOER. Each site was labelled so that after searching, the user could limit results to those found on any one site (e.g. just those on Jorum Open) or strand of UKOER. The domains/directories searched are:

* http://open.jorum.ac.uk/
* http://www.vimeo.com/
* http://www.youtube.com/
* http://www.slideshare.net/
* http://www.scribd.com/
* http://www.flickr.com/
* http://repository.leedsmet.ac.uk/main/
* http://openspires.oucs.ox.ac.uk/
* http://unow.nottingham.ac.uk/
* https://open.exeter.ac.uk/repository/
* http://web.anglia.ac.uk/numbers/
* http://www.multimediatrainingvideos.com/
* http://www.cs.york.ac.uk/jbb
* http://www.simshare.org.uk/
* http://fetlar.bham.ac.uk/repository/
* http://open.cumbria.ac.uk/moodle/
* http://skillsforscientists.pbworks.com/
* http://core.materials.ac.uk/search/
* http://www.humbox.ac.uk/

These were chosen as they were known to be used by a number of UKOER projects for disseminating resources. We must stress that these are meant to be illustrative of sites where UKOER resources may be found, they are definitely not intended to be a complete or even a sufficient set of sites.

This is the simplest option, the configuration files are hosted on Google and managed through forms on the Google website. Expanding it to cover other web sites requires being given permission to contribute by the original creator and then adding URLs as required.

Reflections
Setting up this search engine was almost trivially easy. Embedding it in a website is also straightforward (Google provides code snippets to cut and paste).

The approach will only be selective for OERs if those resources can be identified through a term or tag added to the user-entered search query or if it can be selected through a specific URL pattern (including the case where a site is wholly or predominantly OERs). This wasn’t always the case.

Importantly, not all expected results appear, this is possibly because the resources on these sites aren’t tagged as UKOER or may be due to the pages not being indexed by Google. However, sometimes the omission seems inexplicable. For example a search for “dental” limited to the Core materials website on Google yields the expected results the equivalent search on the CSE yields no results.

While hosting the configuration files off-google and editing them as XML files or modifying the programmatically allows some interesting refinement of the approach we found this to be less easy. One difficulty is that the documentation on Google is somewhat fragmented and frequently confusing. Different parts of it seem to have been added by different people and different times, and it was often the case that a “link to more information” about something we were trying to do failed to resolve the difficulty that had been encountered. This was compounded by some unpredictable behaviour which may have been caused by caching (maybe on serving the configuration files, or Google reading them, or Google serving the results), or by delays in updating the indexes for the search engine, which made testing changes to the configuration files difficult. These difficulties can be overcome, but we were unconvinced that there would be much benefit in this case and so concentrated our effort elsewhere.

Conclusions
If it works for the sites you’re interested in, we recommend the simple Google custom searches as very quick method for providing a search for a subset of resources from across a specified range of hosts. We reserve judgement on the facility for creating dynamic search engines by hosting the configurations files on ones own server.

An introduction to Delores

At the UKOER phase 2 startup meeting the collection strand projects were asked to provide a short introduction saying what their collection was about, who it was for, where the material was coming from and what technical approach was being used. This is roughly what I said about Delores.

We are building static and dynamic collections of open educational resources for Engineering Design.

Engineering Design is the branch of engineering dealing with the design of all engineered products from clothes pegs to Concorde. It deals principally with creating something that will work in a way that satisfies the design need. The design process consists of a number of phases, starting with floating ideas about how the design need might be met, and ending with delivery of a complete and detailed description of the product to be manufactured, based on sound engineering concepts and principles.

The static collection will use a WordPress blog (not this one) to present resources that have been selected for their match to common elements of engineering design curricula from several UK universities.

Resources will come from UKOER projects, OpenLearn, OCWC, Jorum, Xpert, OCWSearch, OER Commons…anywhere we can find them. We will do one blog post per resource, either embedding the resource in the post or linking out to it (depending on nature of resource) and categorise it against topics from the curriculum. WordPress provides web presence, user interface, search + browse, RSS export, and metadata (Dublin core, OAI-ORE) either out-of-the-box or with suitable choice of plug-ins and theme.

The dynamic collection is technically much more interesting. We will draw on the same sources of OERs and use the same curriculum, but this time selection and classification will be automatic.

We will use the output of the JISC Bayesian Feed Filter project to aggregate and select resources. Bayesian filtering is the same process as many spam filters use, but we will be using it to recognise resources about design engineering on the basis of information from RSS feeds. Like a spam filter it needs to be trained with items that it is told how to classify: we will be using the resources from the static collection to show it what design engineering resources look like.

The Bayesian feed filter will produce an RSS feed of items that it thinks are about design engineering. This will be sent to some software developed at Bath called Waypoint.

Waypoint is an integrated search and retrieval system developed for accessing engineering documents (but generalizable to any document corpus) which has two main elements.

In the first element, the documents are organized against a set of classification schemes. This approach is known as facetted classification. The classification is automatic, being carried out using a standard constraint-based classification approach (using carefully selected classification rules) in which pre-coded sets of constraint are used to relate the textual content of each document with the particular topics or themes (characteristically technical and process topics) of interest.

The second element consists of a browsable user interface which provides the user with continuous feed-back of how the search is progressing based on the selections made and keywords used. The user interacts with the system by selecting facet categories of interest. As these are selected the hierarchical display is dynamically pruned to reflect the user’s selection in order to indicate which categories may be used to further refine the selection. This approach is known as Adaptive Content Matching, the effect of which is to present the user at all times with only that part of the classification structure that will lead to a non-null selection.

Waypoint is particularly suited as an interface for students searching for pre-classified educational resources as it allows exploration of the search space in a rewarding way.

It is very intuitive to use, and because of the facetted classification and ACM approaches means that the user is not frustrated by the system merely returning an empty set.

There was some discussion about the relative merits of Blogs and a Wikis as hosts for the static collection. I wouldn’t criticise anyone for choosing to use a wiki for work similar to this, but to my mind the advantage of a wordpress blog: are that the category approach gives good navigation and provides sub-collections (with RSS feeds); the develop community is good, there are plugins for just about anything you might want to do; and the backup/export/migration features are solid. Wikis I think might be better if you want more flexibility in presenting views on parts of your collection (for example wikipedia’s portals) rather than simple list by category and if you want collaborative editting of content, with changelogs, rollback etc.

Additional technical work for UKOER

CETIS has been funded by JISC to do some additional technical work relevant to the the UKOER programme. The work will cover three topics: deposit via RSS feeds, aggregation of OERs, and tracking & analysis of OER use.

Feed deposit
There is a need for services hosting OERs to provide a mechanism for depositors to upload multiple resources with minimal human intervention per resource. One possible way to meet this requirement that has already identified by some projects is “feed deposit”. This approach is inspired by the way in which metadata and content is loaded onto user devices and applications in podcasting. in short, RSS and ATOM feeds are capable, in principle, of delivering the metadata required for deposit into a repository and in addition can provide either a pointer to the content or that content itself may be embedded into the feed. There are a number of issues with this approach that would need to be overcome.

In this work we will: (1) Identify projects, initiatives, services, etc. that are engaged in relevant work [–if that’s you, please get in touch]. (2) Identify and validate the issues that would arise with respect to feed deposit, starting with those outlined in the Jorum paper linked to above. (3) Identify current approaches used to address these issues, and identify where consensus may be readily achieved.

Aggregation of OERs
There is interest in facilitating a range of options for the provision of aggregations of resources representing the whole or a subset of the UKOER programme output (possibly along with resources from other sources). There have been some developments that implement solutions based on RSS aggregation, e.g. Ensemble and Xpert; and the UKOLN tagometer measures the number of resources on various sites that are tagged as relevant to the UKOER programme.

In this work we will illustrate and report on other approaches, namely (a) Google custom search, (b) query and result aggregation through Yahoo pipes and (c) querying through the host service APIs. We will document the benefits and affordances as well as drawbacks and limitations of each of these approaches. These include the ease with which they may be adopted, and the technical expertise necessary for their development, their dependency on external services (which may still be in beta), their scalability, etc.

Tracking and analysis of OER use
Monitoring the release of resources through various channels, how those resources are used and reused and the comments and ratings associated with them, through technical means is highly relevant to evaluating the uptake of OERs. CETIS have already described some of the options for resource tracking that are relevant to the UKOER programme.

In this work we will write and commission case studies to illustrate the use of these methods, and synthesise the results learnt from this use.

Who’s involved in this work
The work will be managed by me, Phil Barker, and Lorna M Campbell.

Lisa J Rogers will be doing most of the work related to feed deposit and aggregation of OERs

R John Robertson will be doing most of the work relating to Tracking and analysis of OER use.

Please do contact us if you’re interested in this work.

Web2 vs iTunesU

There was an interesting discussion last week on the JISC-Repositories email list that kicked off after Les Carr asked

Does anyone have any experience with iTunes U? Our University is thinking of starting a presence on Apple’s iTunes U (the section of the iTunes store that distributes podcasts and video podcasts from higher education institutions). It looks very professional (see for example the OU’s presence at http://projects.kmi.open.ac.uk/itunesu/ ) and there are over 300 institutions who are represented there.

HOWEVER, I can’t shake the feeling that this is a very bad idea, even for lovers of Apple products. My main misgiving is that the content isn’t accessible apart from through the iTunes browser, and hence it is not Googleable and hence it is pretty-much invisible. Why would anyone want to do that? Isn’t it a much better idea to put material on YouTube and use the whole web/web2 infrastructure?

I’ld like to summarize the discussion here so that the important points raised get a wider airing; however it is a feature of these high quality discussions like this one that people learn and change their mind as a result, so I please don’t assume that people quoted below still hold the opinions attributed to them. (Fro example, invisibility on Google turned out to be far from the case for some resources.) If You would like to see the whole discussion look in the JISCMAIL archive

The first answers from a few posters was that it is not an either/or decision.

Patricia Killiard:

Cambridge has an iTunesU site. […] the material is normally deposited first with the university Streaming Media Service. It can then be made accessible through a variety of platforms, including YouTube, the university web pages and departmental/faculty sites, and the Streaming Media Service’s own site, as well as iTunesU.

Mike Fraser:

Oxford does both using the same datafeed: an iTunesU presence (which is very popular in terms of downloads and as a success story within the institution); and a local, openly available site serving up the same
content.

Jenny Delasalle and David Davis of Warwick and Brian Kelly of UKOLN also highlighted how iTunesU complemented rather than competed with other hosting options, and was discoverable on Google.

Andy Powell, however pointed out that it was so “Googleable” that a video from Warwick University on iTunesU video came higher in the search results for University of Warwick No Paradise without Banks than the same video on Warwick’s own site. (The first result I get is from Warwick, about the event, but doesn’t seem to give access to the video–at least not so easily that I can find it; the second result I get is the copy from iTunes U, on deimos.apple.com . Incidentally, I get nothing for the same search term on Google Videos.) He pointed out that this is “(implicitly) encouraging use of the iTunes U version (and therefore use of iTunes) rather than the lighter-weight ‘web’ version.” and he made the point that:

Andy also raised other “softer issues” about which ones will students be referred to that might reinforce one version rather than another as the copy of choice even if it wasn’t the best one for them.

Ideally it would be possible to refer people to a canonical version or a list of available version, (Graham Triggs mentioned Google’s canonical URLs, perhaps if if Google relax the rules on how they’re applied) but I’m not convinced that’s likely to happen. So there’s a compromise, variety of platforms for a variety of needs Vs possibly diluting the web presence for any give resource.

And a response from David Davies:

iTunesU is simply an RSS aggregator with a fancy presentation layer.
[…]
iTunesU content is discoverable by Google – should you want to, but as we’ve seen there are easier ways of discovering the same content, it doesn’t generate new URLs for the underlying content, is based upon a principle of reusable content, Apple doesn’t claim exclusivity for published content so is not being evil, and it fits within the accepted definition of web architecture. Perhaps we should simply accept that some people just don’t like it. Maybe because they don’t understand what it is or why an institution would want to use it, or they just have a gut feeling there’s something funny about it. And that’s just fine.

mmm, I don’t know about all these web architecture principles, I just know that I can’t access the only copy I find on Google. But then I admit I do have something of a gut feeling against iTunesU; maybe that’s fine, maybe it’s not; and maybe it’s just something about the example Andy chose: searching Google for University of Warwick slow poetry video gives access to copies at YouTube and Warwick, but no copy on iTunes.

I’m left with the feeling that I need to understand more about how using these services affects the discoverability of resources using Google–which is one of the things I would like to address during the session I’m organising for the CETIS conference in November.