HTML5 semantic sections with markdown

I’ve written a second extension for python markdown that will work with MkDocs. This one will let you put HTML5 semantic sectioning elements into the generated HTML. So, instead of just having <div>s in your generated HTML you can have, for example, an <article> or <chapter> divided into <sections>. Each of these can have an id attribute, and so can be identified, described in metadata (for example using embedded YAML) and linked to as a part of the page.  You can find the OCXSect extension on github, and you can read more about the development of it in some pages generated by MkDocs (that incidentally use the extension).

markdown that will generate HTML5 with semantic sectioning elements

What’s it do?

Markers of the form ~~X~~ and ~~\X~~ can be inserted in a markdown document at the beginning and end of what you want to be a section. The letter used will determine the type of HTML5 sectioning element is put into the HTML,<chapter><article><header><main><footer><section><nav> and <div> are currently supported (use the initial of the type of section you want). Text  after the letter will be used for the id attribute of the section. The choice of text is limited to ASCII A-Z, a-z, 0-9 !$-()+ so as to avoid unfortunate after effects if a non-URL safe character is used in what will become a fragment identifier in a URL. As a bonus, a textual representation of the structure is generated that can be useful for debugging.

So the markdown

~~C lesson1~~
#Markdown structure test
This is in the header section of a chapter. The chapter has id #lesson1. The header has no id.
~~S section 1~~
#Activity 1
This is in a regular section (id #section1) of a chapter
This is in the footer of the chapter

Generates the HTML

<chapter id="lesson1">
<h1>Markdown structure test</h1>
This is in the header section of a chapter. The chapter has id #lesson1. The header has no id.
<section id="section1">
<h1>Activity 1</h1>
This is in a regular section (id #section1) of a chapter
This is in the footer of the chapter

and the following representation of that structure:

|--chapter{'id': 'lesson1'}
    |--section{'id': 'section1'}

What’s it for?

The motivation for this extension, as with the previous one for generating metadata via YAML in markdown, is the K12 OCX project that I have been working on with people at Learning Tapestry. The aim of that project is to describe the structure and intent of curriculum and content material (CCM) for K12 in such a way as not only allows the exchange of CCM but also facilitates reuse and repurposing by editing and remixing. So rather than just say “here’s a load course material, use it as it is” we’re providing information about what the structure is, from course down to  activity, what all the pieces are and metadata to describe the content and role of those pieces. At some point that structuring happens within an HTML page, and so the pieces being described are sections of a page. For this we advocate using HTML5 sectioning elements that indicate the semantics of that section of the HTML, and JSON-LD metadata to describe the pedagogically significant sections. The sectioning part of this is actually quite similar to the ideas around textbook structure and elements I learned about from the Rebus community session run by the open textbook network.

We use MkDocs for documenting the K12OCX spec, and so in the spirit of eating our dog food I wanted to explore whether the spec could be exemplified to some extent by the documentation.

How’s it work

Essentially there are two parts to this extension, a python markdown treeprocessor that rearranges the HTML element tree after it has been generated, and a preprocessor that makes sure that the input is what we expect it to be.

The preprocessor runs through all the nodes in the HTML element tree, and recursively through the children of those nodes, replacing any p elements that indicate the start of a sectioning element (i.e . those that have text such as ~~S~~) with a new section into which subsequent nodes are moved until an element indicating the end of a section is reached.

This presupposes that the section start and end markers are in a paragraph by themselves, which will only happen if there is a blank line or block-level element the line before and after in the markdown before processing. The preprocessor does the user the favour of making sure that this is the case, while also making the input upper case, so that the original markers can be case-insensitive.

Take a look at my markdown notebook for this work for implementation details, or on github for the code.

Information about how to download, install, test and run the code in MkDocs are also on github. Please test it with caution, and let me know what you think.