Initial thoughts on EPUB-WEB (Portable Documents for the Open Web Platform)

In a W3C Unofficial Draft White Paper “Advancing Portable Documents for the Open Web Platform: EPUB-WEB” published 21 Nov 2014, Markus Gulling of IPDF (curators of the EPUB standards) and Ivan Herman of W3C (curators of web standards) have highlighted the potential of a specification that brings EPUB on to the Web. Informally known as EPUB-WEB, the vision is that this specification would make “EPUB a first-class citizen of the Open Web Platform and as a result significantly reduce the complexity of deploying EPUB content into browsers, for online as well as offline consumption”

EPUB3 is based mostly on web standards, i.e. a collection of HTML5 files with associated bells and whistles (embedded video, audio, SVG, JavaScript, CSS) held in a zip archive with an XML manifest  to tell an application what is there and what order to display it in. So at first EPUB-WEB seems straightforward: get rid of the zip archive, use the manifest to point to files anywhere on the web (IMS Content Packaging has allowed a similar route with “logical packages” which allow for both local and remote components). But the draft white papers raises some interesting points

Firstly, on that manifest, in section 3.1 the authors note that while the zip file + XML manifest is a common pattern:

“W3C’s Web Application Working Group has, in its new charter, the task of defining a general packaging format for the Web to encompass the needs of various applications (like installing Web Applications or downloading data for local processing). It is probably advantageous for EPUB-WEB to adopt this format, thereby being compatible with what Web Browsers would implement anyway. While this general packaging format could hypothetically be compatible with the ZIP+XML manifest format used by EPUB (and also by the Open Document Format [ODF]) the broader requirements of installable applications and other types of content, and efficient incremental transmission over networks, may well imply a different and incompatible packaging format.”

Secondly, there’s a question about how you identify documents (and fragments within documents) reliably when they may be either online or off-line depending on whether the user has decided to “archive” them (and I think archive here includes download onto an ebook reader to take on holiday). “What is the URI of the offline version of the document”. Interestingly there is a link drawn with the W3C Annotation Working Group:

The recently formed W3C Annotation Working Group has a joint deliverable with the W3C Web Application Working Group called “Robust Anchoring”. This deliverable will provide a general framework for anchoring; and, although defined within the framework of annotations, the specification can also be used for other fragment identification use cases. Similarly, the W3C Media Fragments specification [media-frags] may prove useful to address some of the use cases.

And thirdly there is (of course) Metadata. EPUB 3 has plenty of places to put your Metadata. Most conventional publishing needs for metadata inside the EPUB file are covered with the range of metadata allowed in the manifest. However, there is additional potential for in-line metadata that is “agnostic to online and offline modes” that will “seamlessly support  discovery and harvesting by both generic Web search engines, as well as dedicated bibliographic/archival/retailer systems” The note points to schema.org in all but name:

The adoption of HTML as the vehicle for expressing publication-level metadata (i.e., using RDFa and/or Microdata  for metadata like authors or title) would have the added benefits of better I18N support than XML or JSON formats.

And what about application to learning? Taken in conjunction with the Annotation work starting at W3C, the scope for eTextBooks online (or whatever you want to call educational use of EPUBWeb for education) seems clear. One area that seems important for education use that seems inadequately addressed in the draft white paper is alternative presentations that would make the material remixable and adaptable to meet individual learner needs. There a little in draft about presentation control and personalization, but it rather limited: changing the font size or page layout rather than changing the learning pathway.

eBooks and libraries, the right to eRead? #ebooks14

About once a year I go to some meeting or another on libraries and eBooks. I nearly always come back from it struck by the tension between libraries, as institutions of stability, and the rapid pace at which technology companies are driving forward eBook technology.  This year’s event of that type was the Scottish Library and Information Council’s 13th annual eBook conference. The keynote from Gerald Leitner, chair of the European Bureau of Library, Information and Documentation Associations task force on eBooks was especially interesting to me in introducing the Right to eRead Campaign.

Leitner spoke about the ecosystem around ebooks and libraries and about the uncertainty and instability throughout the system. Can lending libraries compete  with commercial lending of eBooks (Amazon kindle unlimited, £6 per month for over half a million titles)?  Publishers too are threatened and are fighting, as the spat between Amazon and Hachette shows–and note, it’s not publishers who are driving the change to eBooks, it’s technology companies, notably Amazon and Apple.  Libraries are at risk of being the collateral damage in this fight.  And where do book lovers fit in, those who as well as reading physical books read ebooks on various mobile devices?

Leitner made the point that consumers and libraries very rarely buy eBooks; you buy a limited license that allows you to download a copy and read it under certain restrictions–and no, like most people I have never bothered to read those restrictions though I am aware of the limit to the devices on which I can read that copy, that I am not allowed to lend it and that Amazon can delete copies remotely (I don’t use Apple products, but I assume they have similar terms). A consequence of this relates to the exhaustion of rights. Under copyright authors have the right to decide whether/how their work is published, and the publishers may have the right to sell books that contain the authors work. But once bought the book becomes the property of the person who bought it; the publishers rights are exhausted, they cannot longer forbid that it be resold or lent. The right to lend and resell is provided by Article 6 of the WIPO Copyright Treaty and the EU Rental and Lending directive (2006/115/EC). Library lending rights are written into statute and accompanied by remuneration for authors. Ebooks, intangible, licensed and not sold, are classed as services by the EU Information and Service Directive (2001/29/EC), and for these there is no exhaustion of rights, no right to resell or lend, and no statutory guarantee that libraries may provide access.

The EBLIDA right to eRead campaign is about trying to secure a right for libraries to provide access to eBooks. The argument is that without this right to access  information itself becomes privatised at the cost of an informed democracy. The campaign is asking for a statutory exemption with IP law, or mandatory fair licensing that provides libraries with the right to acquire and a right to lend.

Embed innovation or implant potential?

This thought on etextbooks is an overflow from a conversation I was having on skype with Li and Tore about a workshop aimed at scoping what we would like the etextbooks of the future to look like. We were talking about how the idea of a textbook–its role in teaching and learning and hence (perhaps) its nature–was different in different cultures (Europe, US, Asia) and educational settings (school, higher education), when Tore said something along the lines of “why are are discussing this, shouldn’t we be talking about educational requirements”. Of course we should be talking about educational requirements and how they might be met by technologies such as ebooks, but I think there is more than that. My immediate reply was that by defining an area of interest as “etextbooks” we were implying a continuity with textbooks. I don’t think continuity implies a simple like-for-like replacement because I think the potential for etextbooks is far greater than that for paper textbooks, so moving to etextbooks should radically shift the trajectory of change. But the implication seems to be that etextbooks will pick up where paper text books leave off. That, I think is different from 20 or so years ago when we were talking about how computer based learning (or more recently online courses and technology enhanced learning) marked a step change in how education was delivered. In that case much of the talk was about how technology will radically change education. Even if my characterisation of the two cases as opposing is a bit crude (as it is), it’s worth comparing the two approaches. I’ll do that here, just briefly.

The technology-will-revolutionise-education approach runs the risk of alienating the people who you most need on your side if that revolutionary change is to be an improvement, that is the teachers and students. I remember we used to talk about technology as a Trojan Horse for introducing pedagogic improvement in HE, something that I stopped doing when I went to a presentation where the speaker pointed out that the Trojan Horse was an act of war in the context of a bloody siege, and perhaps that isn’t the way learning technologists should approach teachers. More importantly, introducing technology probably isn’t the best way to approach improving education. Introducing technology is not straightforward, it will take attention away from other matters: whatever the initial intent, it will distract from thinking about teaching and learning. If you want to improve education you should focus on that and probably not do something else that is really difficult in it’s own right at the same time.

So the start-with-something-familiar approach has an advantage here in that it simply focuses on planting a technology with higher potential into existing practice. The risk is that substitution is seen as as all that needs to be done, or that requirements that arise from this objective are over prioritised. For example, I have seen requirements for page-faithful display (i.e. the ability to reproduce on the ebook reader exactly what would be on paper) and page numbers as requirements for etextbooks. They may be desirable for marketing purposes, and there are real functional requirements relating to how content is presented and how it may be referenced, but building-in these restrictions as requirements would, in my view, be a mistake. Let’s have a strategy where we aim to embed but with a view to enhancing.

A triangle of objectives for etextbook technology; from the bottom: cost, availability, portability, functionality, innovation.
The path forward suggested for the US by the Educause/Internet 2 pilot etextbook pilot. Start with a basis aimed at increasing adoption and move forward to improvements in functionality and transformation.
Image from Grajek, Susan, Understanding what higher education needs from e-textbooks: an EDUCAUSE/Internet2 pilot (Research Report), EDUCAUSE July 2013.

I think this is the approach which is suggested by the recent report on the Educause/Internet2 pilots Understanding what higher education needs from e-textbooks, summarised in the image on the right. I must admit that I find this somewhat depressing, I am interested in getting to the peak of that pyramid as quickly as possible, but I would rather get there with teachers and learners than to be touting some theoretical improvement that is divorced from real teaching and learning. And of course, it’s important to be thinking from the outset what functionality and innovation should be built once the technology is in people’s hands.

I am presenting a session at Alt-C 2013 entitled Into the Mainstream? New developments in eTextBooks next month where I hope to discuss ideas like this.

ebooks 2013

Every year for the past dozen or so years the Department of Information Sciences at UCL have organised a meeting on ebooks. I’ve only been to one of them before, two or three years ago, when the big issues were around what publishers’ DRM requirements for ebooks meant for libraries. I came away from that musing on what the web would look like if it had been designed by publishers and librarians (imagine questions like: “when you lend out our web page, how will you know that the person looking at the screen is a member of your library?”…). So I wasn’t sure what to expect when I decided to go to this year’s meeting. It turned out to be far more interesting than I had hoped, I latched on to three themes of particular interest to me: changing paradigms (what is an ebook?), eTextBooks and discovery.

Changing paradigms

With the earliest printed books, or incunabula, such as the Gutenberg Bible, printers sought to mimic the hand written manuscripts with which 15th cent scholars were familiar; in much the same way as publishers now seek to replicate printed books as ebooks.

In the first presentation of the day Lorraine Estelle, chief executive of Jisc Collections, focussed on access to electronic resources. Access not lending; resources not ebooks. She highlighted the problems of using yesterday’s language and thinking as being problematic in this context, like having a “horseless carriage” and buying it hay. [This is my chance to make the analogy between incunabula and ebooks again, see right.] The sort of discussions I recalled from the previous meeting I attended reflect this thinking, publishers wanting a digital copy of a book to be equivalent to the physical book, only lendable to one person at a time and to require replacing after a certain number of loans.

We need to treat digital content as offering new possibilities and requiring new ways of working. This might be uncomfortable for publishers (some more than others) and there was some discussion about how we cannot assume that all students will naturally see the advantages, especially if they have mostly encountered problematic content that presents little that could not be put on paper but is encumbered with DRM to the point that it is questionable as to whether they really own the book. But there is potential as well as resistance. Of course there can be more interesting, more interactive content–Will Russell of the Royal Society of Chemistry described how they have been publishing to mobile devices, with tools such as Chem Goggles that will recognise a chemical structure and display information about the chemical. More radically, there can also be new business models: Lorraine suggested Institutions could become publishers of their own teaching content, and later in the day Caren Milloy, also of Jisc Collections, and Brian Hole of Ubiquity Press pointed to the possibilities of open access scholarly publishing.

Caren’s work with the OAPEN Library is worth looking through for useful information relating to quality assurance in open monograms such as notifying readers of updates or errata. Caren also talked about the difficulties in advertising that a free online version of a resource is available when much of the dissemination and discovery ecosystem (you know, Amazon, Google…) is geared around selling stuff, difficulties that work with EDitEUR on the ONIX metadata scheme will hopefully address soon.

Brian described how Ubiquity Press can publish open access ebooks by driving down costs and being transparent about what they charge for. They work from XML source, created overseas, from which they can publish in various formats including print on demand, and explore economies of scale by working with university presses, resulting in a charge to the author (or their funders) of about £150 for a chapter assuming there is nothing to complex in that chapter.


All through the day there were mentions of eTextBooks, starting again with Lorraine who highlighted the paperless medic and how his quest to work only with digital resources is complicated by the non-articulation of the numerous systems he has to use. When she said that what he wanted was all his content (ebooks, lecture handouts, his own notes etc.) on the same platform, integrated with knowledge about when and where he had to be for lectures and when he had exams, I really started to wonder how much functionality can you put into an eContent platform before it really becomes a single-person content-oriented VLE. And when you add in the ability to share notes with the social and communication capability of most mobile devices, what then do you have?

A couple of presentations addressed eTextBooks directly, from a commercial point of view. Jenni Evans spoke about Vital Source and Andrejs Alferovs about Kortext both of which are in the business of working with institutions distributing online textbooks to students. Both seem to have a good grasp of what students want, which I think should be useful requirements to feed into eTextBook standardization efforts such as eTernity, these include:

  • ability to print
  • offline access
  • availability across multiple devices
  • reliable access under load
  • integration with VLE
  • integration with syllabus/curriculum
  • epub3 interactive content
  • long term access
  • ability for student to highlight/annotate text and share this with chosen friends
  • ability to search text and annotations


There was also a theme of resource discovery running through the day, and I have already mentioned in passing that this referenced Google and Amazon, but also social media. Nick Canty spoke about a survey of library use of social media, I thought it interesting that there seemed to be some sophisticated use of the immediacy of Twitter to direct people to more permanent content, e.g. to engagement on Facebook or the library website.

Both Richard Wallis of OCLC and Robert Faber of OUP emphasized that users tend to use Google to search and gave figures for how much of the access to library catalogue pages came direct from Google and other external systems, not from their own catalogue search interface. For example the Biblioteque Nationale de France found that 80% of access to their catalogue pages cam directly from web search engines not catalogue searches, and Robert gave similar figures for access to Oxford Journals. The immediate consequence of this is that if most people are trying to find content using external systems then you need to make sure that at least some (as much as possible, in fact) of your content is visible to them–this feeds in to arguments about how open access helps solve discoverability problems. But Richard went further, he spoke about how the metadata describing the resources needs to be in a language that Google/Bing/Yahoo understand, and that language is schema.org. He did a very good job distinguishing between the usefulness of specialist metadata schema for exchanging precise information between libraries or publishers, but when trying to pass general information to Google:

it’s no use using a language only you speak.

Richard went on to speak about the Google Knowledge graph and their “things not strings” approach facilitated by linked data. He urged libraries to stop copying text and to start linking, for example not to copy an author name from an authority file but to link to the entry in that file, in Eric Miller’s words to move from cataloguing to “catalinking”.


So was this really about ebooks? Probably not, and the point was made that over the years the name of the event has variously stressed ebooks and econtent and that over that time what is meant by “ebook” has changed. I must admit that for me there is something about the idea of a [e]book that I prefer over a “content aggregation” but if we use the term ebook, let’s use it acknowledging that the book of the future will be as different from what we have now as what we have now is from the medieval scroll.

Picture Credit
Scanned image of page of the Epistle of St Jerome in the Gutenberg bible taken from Wikipedia. No Copyright.

eTextBooks Europe

I went to a meeting for stakeholders interested in the eTernity (European textbook reusability networking and interoperability) initiative. The hope is that eTernity will be a project of the CEN Workshop on Learning Technologies with the objective of gathering requirements and proposing a framework to provide European input to ongoing work by ISO/IEC JTC 1/SC36, WG6 & WG4 on eTextBooks (which is currently based around Chinese and Korean specifications). Incidentally, as part of the ISO work there is a questionnaire asking for information that will be used to help decide what that standard should include. I would encourage anyone interested to fill it in.

The stakeholders present represented many perspectives from throughout Europe: publishers, publishing industry specification bodies (e.g. IPDF who own EPUB3, and DAISY), national bodies with some sort of remit for educational technology, and elearning specification and standardisation organisations. I gave a short presentation on the OER perspective.

Many issues were raised through the course of the day, including (in no particular order)

  • Interactive and multimedia content in eTextbooks
  • Accessibility of eTextbooks
  • eTextbooks shouldn’t be monolithic and immutable chunks of content, it should be possible to link directly to specific locations or to disaggregate the content
  • The lifecycle of an eTextbook. This goes beyond initial authoring and publishing
  • Quality assurance (of content and pedagogic approach)
  • Alignment with specific curricula
  • Personalization and adaptation to individual needs and requirements
  • The ability to describe the learning pathway embodied in an eTextbook, and vary either the content used on this pathway or to provide different pathways through the same content
  • The ability to describe a range IPR and licensing arrangements of the whole and of specific components of the eTextbook
  • The ability to interact with learning systems with data flowing in both directions

If you’re thinking that sounds like a list of the educational technology issues that we have been busy with for the last decade or two, then I would agree with you. Furthermore, there is a decade or two’s worth of educational technology specs and standards that address these issues. Of course not all of those specs and standards are necessarily the right ones for now, and there are others that have more traction within digital publishing. EPUB3 was well represented in the meeting (DITA is the other publishing standard mentioned in the eTernity documentation, but no one was at the meeting to talk about that) and it doesn’t seem impossible to meet the educational requirements outlined in the meeting within the general EPUB3 framework. The question is which issues should be prioritised and how should they be addressed.

Of course a technical standard is only an enabler: it doesn’t in itself make any change to teaching and learning; change will only happen if developers create tools and authors create resources that exploit the standard. For various reasons that hasn’t happened with some of the existing specs and standards. A technical standard can facilitate change but there needs to a will or a necessity to change in the first place. One thing that made me hopeful about this was a point made by Owen White of Pearson that he did not to think of the business he is in as being centred around content creation and publishing but around education and learning and that leads away from the view of eBooks as isolated static aggregations.

For more information keep an eye on the eTernity website

Jisc Observatory report on Ebooks in Education

The joint CETIS and UKOLN Observatory has just published a report “Preparing for Effective Adoption and Use of Ebooks in Education” written by James Clay. My CETIS colleague Li and I wrote the foreword for this report, which I’ve reproduced here but really you would be better going to the observatory and downloading the whole report.
