Google custom search for UKOER

It has become very clear to me over the last week or so that I haven’t done enough to publicise some work done over the summer by my colleague Lisa Scott (Lisa Rogers, as she then was) on showing how you can create a Google Custom Search Engine to search for OER materials. In summary, it’s very easy, very useful, but not quite the answer to all your UKOER discovery problems.

A Google Custom Search Engine (Google CSE) allows one to use Google to search only certain selected pages. The pages to be searched can be specified individually or as URL patterns identifying a part of a site or an entire site. Furthermore the search query can be modified by adding terms to those entered by the user.

The custom search engine can be accessed through a search box that can be hosted on Google or embedded in a web page, blog etc. Likewise, the search results page can be presented on Google or embedded in another site. Embedding of both search box and results page utilises javascript hosted on the Google site.

The pages to be searched can be specified either directly by entering the URL patterns via the Google CSE interface, listed in an XML or TSV (tab separated variable) file which is uploaded to the Google CSE site, or as a feed from any external site. This latter option offers powerful possibilities for dynamic or collective creation of Custom Search Engines, especially since Google provide a javascript snippet which will use the links on a page as the list of URLs to search. So, for example a group of people could have access to a wiki on which they list the sites they wish to search and thus build a CSE for their shared interest, whatever that may be.

A refinement that is sometimes useful is to label pages or sites that are searched. Labels might refer to sub-topics of the theme of the custom search engine or to some other facet such as resource type. So a custom search engine for engineering OERs might label pages as different branches of engineering {mechnical, electronic, civil, chemical, …} or type of resource to be found {presentation, image, movie, simulation, articles, …}. In practice, whatever the categorisation chosen for labels, there will often be pages or sites that mix resource from different categories, so use of this feature requires thought as to how to handle this.

A Google CSE for UKOER
Our example of a simple Google CSE can be found hosted on Google.

This works as a Google search limited to pages at the domains/directories listed below; where the URL pattern doesn’t lead only to content that is UKOER the term ‘+UKOER’ added to the search terms entered by the user. The ‘+’ in the added term means that only those pages which contain the term UKOER are returned. This is possible since the programme mandated that all resources should be associated with the tag UKOER. Each site was labelled so that after searching, the user could limit results to those found on any one site (e.g. just those on Jorum Open) or strand of UKOER. The domains/directories searched are:

* http://open.jorum.ac.uk/
* http://www.vimeo.com/
* http://www.youtube.com/
* http://www.slideshare.net/
* http://www.scribd.com/
* http://www.flickr.com/
* http://repository.leedsmet.ac.uk/main/
* http://openspires.oucs.ox.ac.uk/
* http://unow.nottingham.ac.uk/
* https://open.exeter.ac.uk/repository/
* http://web.anglia.ac.uk/numbers/
* http://www.multimediatrainingvideos.com/
* http://www.cs.york.ac.uk/jbb
* http://www.simshare.org.uk/
* http://fetlar.bham.ac.uk/repository/
* http://open.cumbria.ac.uk/moodle/
* http://skillsforscientists.pbworks.com/
* http://core.materials.ac.uk/search/
* http://www.humbox.ac.uk/

These were chosen as they were known to be used by a number of UKOER projects for disseminating resources. We must stress that these are meant to be illustrative of sites where UKOER resources may be found, they are definitely not intended to be a complete or even a sufficient set of sites.

This is the simplest option, the configuration files are hosted on Google and managed through forms on the Google website. Expanding it to cover other web sites requires being given permission to contribute by the original creator and then adding URLs as required.

Reflections
Setting up this search engine was almost trivially easy. Embedding it in a website is also straightforward (Google provides code snippets to cut and paste).

The approach will only be selective for OERs if those resources can be identified through a term or tag added to the user-entered search query or if it can be selected through a specific URL pattern (including the case where a site is wholly or predominantly OERs). This wasn’t always the case.

Importantly, not all expected results appear, this is possibly because the resources on these sites aren’t tagged as UKOER or may be due to the pages not being indexed by Google. However, sometimes the omission seems inexplicable. For example a search for “dental” limited to the Core materials website on Google yields the expected results the equivalent search on the CSE yields no results.

While hosting the configuration files off-google and editing them as XML files or modifying the programmatically allows some interesting refinement of the approach we found this to be less easy. One difficulty is that the documentation on Google is somewhat fragmented and frequently confusing. Different parts of it seem to have been added by different people and different times, and it was often the case that a “link to more information” about something we were trying to do failed to resolve the difficulty that had been encountered. This was compounded by some unpredictable behaviour which may have been caused by caching (maybe on serving the configuration files, or Google reading them, or Google serving the results), or by delays in updating the indexes for the search engine, which made testing changes to the configuration files difficult. These difficulties can be overcome, but we were unconvinced that there would be much benefit in this case and so concentrated our effort elsewhere.

Conclusions
If it works for the sites you’re interested in, we recommend the simple Google custom searches as very quick method for providing a search for a subset of resources from across a specified range of hosts. We reserve judgement on the facility for creating dynamic search engines by hosting the configurations files on ones own server.

3 thoughts on “Google custom search for UKOER

  1. I’ve spent two years building an academic CSE, at http://www.jurn.org The spin-offs included a subsidiary CSE search-engine for open courseware in audio/videoa and other academic audio files, called Earworm. Also a booklet for purchase, on exactly how to set up and troubleshoot a linked CSE.

  2. Thanks for this, Phil, and thanks to the CETIS newsletter for highlighting the article else I’d never have seen it (so many RSS feeds, so little time…). I’ve implemented a few Google CSEs in my time, so found it easy enough to implement one at:

    http://www.nottingham.ac.uk/~ntzfr/oer/google_cse_ukoer.php

    to test your methods. One thing I can’t figure, though, is how to add +ukoer to each user search. You write:

    “where the URL pattern doesn’t lead only to content that is UKOER the term ‘+UKOER’ added to the search terms entered by the user”

    How do you do this in the CSE? If I try putting a URL such as:

    http://open.jorum.ac.uk ‘+ukoer’

    or similar, the CSE rejects it (rightly) as not being a valid URL. I can’t see anything in the CSE control panel, or in the source of your CSE. Basic question, I know 🙁

Comments are closed.