Initial thoughts on EPUB-WEB (Portable Documents for the Open Web Platform)

In a W3C Unofficial Draft White Paper “Advancing Portable Documents for the Open Web Platform: EPUB-WEB” published 21 Nov 2014, Markus Gulling of IPDF (curators of the EPUB standards) and Ivan Herman of W3C (curators of web standards) have highlighted the potential of a specification that brings EPUB on to the Web. Informally known as EPUB-WEB, the vision is that this specification would make “EPUB a first-class citizen of the Open Web Platform and as a result significantly reduce the complexity of deploying EPUB content into browsers, for online as well as offline consumption”

EPUB3 is based mostly on web standards, i.e. a collection of HTML5 files with associated bells and whistles (embedded video, audio, SVG, JavaScript, CSS) held in a zip archive with an XML manifest  to tell an application what is there and what order to display it in. So at first EPUB-WEB seems straightforward: get rid of the zip archive, use the manifest to point to files anywhere on the web (IMS Content Packaging has allowed a similar route with “logical packages” which allow for both local and remote components). But the draft white papers raises some interesting points

Firstly, on that manifest, in section 3.1 the authors note that while the zip file + XML manifest is a common pattern:

“W3C’s Web Application Working Group has, in its new charter, the task of defining a general packaging format for the Web to encompass the needs of various applications (like installing Web Applications or downloading data for local processing). It is probably advantageous for EPUB-WEB to adopt this format, thereby being compatible with what Web Browsers would implement anyway. While this general packaging format could hypothetically be compatible with the ZIP+XML manifest format used by EPUB (and also by the Open Document Format [ODF]) the broader requirements of installable applications and other types of content, and efficient incremental transmission over networks, may well imply a different and incompatible packaging format.”

Secondly, there’s a question about how you identify documents (and fragments within documents) reliably when they may be either online or off-line depending on whether the user has decided to “archive” them (and I think archive here includes download onto an ebook reader to take on holiday). “What is the URI of the offline version of the document”. Interestingly there is a link drawn with the W3C Annotation Working Group:

The recently formed W3C Annotation Working Group has a joint deliverable with the W3C Web Application Working Group called “Robust Anchoring”. This deliverable will provide a general framework for anchoring; and, although defined within the framework of annotations, the specification can also be used for other fragment identification use cases. Similarly, the W3C Media Fragments specification [media-frags] may prove useful to address some of the use cases.

And thirdly there is (of course) Metadata. EPUB 3 has plenty of places to put your Metadata. Most conventional publishing needs for metadata inside the EPUB file are covered with the range of metadata allowed in the manifest. However, there is additional potential for in-line metadata that is “agnostic to online and offline modes” that will “seamlessly support  discovery and harvesting by both generic Web search engines, as well as dedicated bibliographic/archival/retailer systems” The note points to schema.org in all but name:

The adoption of HTML as the vehicle for expressing publication-level metadata (i.e., using RDFa and/or Microdata  for metadata like authors or title) would have the added benefits of better I18N support than XML or JSON formats.

And what about application to learning? Taken in conjunction with the Annotation work starting at W3C, the scope for eTextBooks online (or whatever you want to call educational use of EPUBWeb for education) seems clear. One area that seems important for education use that seems inadequately addressed in the draft white paper is alternative presentations that would make the material remixable and adaptable to meet individual learner needs. There a little in draft about presentation control and personalization, but it rather limited: changing the font size or page layout rather than changing the learning pathway.