Using the WordPress REST API to post a book from WikiSource to PressBooks with python

I am using Pressbooks to build an online edition of Southey and Coleridge’s Omniana. I transcribed the text for Volume I on wikisource. This post is about how I got that text into pressbooks; copy and paste didn’t appeal, so I thought I would try using the WordPress REST API. You could probably write a PHP plugin that would do this, but I find python a bit easier for exploratory work, so I used that.

Getting the data from Wikisource is reasonably trivial. On wikisource I have transcluded the page transcriptions into a single HTML file of the whole book. This file is relatively easy to parse into the individual articles for posting to Pressbooks, especially as I added <hr /> tags before each article (even the first) and added stop at the end.

In the longer term I want to start indexing the PressBook Omniana using wikidata for linked data. This will let me look at the semantic graph of what Southey and Coleridge were interested in.

First steps with the WordPress API

I’ve not used the WordPress API before, but it is well documented and there is a useful series of articles on envatoTuts+: Introducing the WP REST API.

Put /wp-json onto the end of a WordPress blog URL and you can see the routes and endpoints (e.g. this blog, my Pressbooks/Omniana). (I use the JSON viewer chrome plugin to make these easier to read.) I found wp-api-python very useful in helping make requests against these in python. It’s available via pip as wordpress-api and I found it required python the libraries request beautifulsoup4requests-oauthlib and six. It authenticates via  OAuth, so on WordPress you need the  WordPress REST API – Oauth1.0a plugin or similar; there’s more than you need to know about how OAuth works  on envatotuts+.

I installed the Oauth1.0a plugin for the network on a WordPress multisite and PressBook test servers. Network activation seemed to generate errors on Pressbooks and plain multisite WordPress, so I activated it only for the individual blog/book. Then in the Users tab on the admin screen I was will be able to view and set up applications:

Add Application screen from the OAuth1.0a plugin

Filling out the details and clicking on save consumer and  gave me a client key and client secret.

Back in python I used these to poke around the various API endpoints of my test multisite installation of WordPress, e.g.

from wordpress import API
base_url = "http://wordpress.home.local/test"
api_path = "/wp-json/wp/v2/"
wpapi = API(
    url=base_url,
    consumer_key="thisismykey",
    consumer_secret="thisismysecret",
    api="wp-json",
    version="wp/v2",
    wp_user="phil",
    wp_pass="thisismypassword",
    oauth1a_3leg=True,
    creds_store="~/.wc-api-creds.json",
    callback="http://wordpress.home.local/test/api-test"
)
print("listing posts")
resource = "posts"
try:
    response = wpapi.get(base_url+api_path+resource)
    for post in response.json():
        print(post['id'], post['title'])
except Exception as e:
    print("couldn't get posts")
    print(e)

wpapi uses requests methods, documented here.  Other useful properties and methods are

  • r.ok: boolean, True if HTTP status code is <400
  • r.content, response content in bytes,
  • r.text, response content in text
  • r.headers, response headers
  • r.iter_lines() content a line at a time
  • r.json() response as a json object

Posting to WordPress

Following the envatoTuts+ Creating, Updating, and Deleting Data article and translating to python:

from wordpress import API
base_url = "http://wordpress.home.local/test"
api_path = "/wp-json/wp/v2/"
wpapi = API(
    url=base_url,
    consumer_key="thisismykey",
    consumer_secret="thisismysecret",
    api="wp-json",
    version="wp/v2",
    wp_user="phil",
    wp_pass="thisismypassword",
    oauth1a_3leg=True,
    creds_store="~/.wc-api-creds.json",
    callback="http://wordpress.home.local/test/api-test"
)

print("creating new post")
resource = "posts"
title = "86. Glover's Leonidas."
content = """Glover's Leonidas was unduly praised at its first appearance, and more unduly ...
..."""
excerpt = """Glover's Leonidas was unduly praised at its ..."""
data = {
    "content": content,
    "title": title,
    "excerpt": excerpt,
    "status": "draft",
    "categories": [190]
}
try:
    response = wpapi.post(base_url+api_path+resource, data)
    print(response.json())
except Exception as e:
    print("couldn't post")
    print(e)

The posts resource collection allows creation and retrieval  (POST and GET methods); a specific posts/(?P<id>[\d]+) resource allows update and delete (PUT, PATCH and DELETE methods).

The keys for the data dict are the same as the schema for the WordPress API method, which are also shown in the arguments listed in the JSON returned by wp-json for each endpoint under each route.

Posting to Pressbooks

Pressbooks has a whole extended set of api routes and endpoints, no ‘posts’ resources, but front-matter, back-matter, parts and chapters; all under the /pressbooks/v2/ path.

There is some documentation on the Pressbooks site.  I’m posting articles as chapters into a Pressbook site that already has some organised content, so I don’t have to worry about setting them up. Adapting from the above, changing to URL and credentials to those for my local test instance of Pressbooks, and changing the api-path, version, and resource name, this posts a test chapter to the content part of my book, as a “numberless” chapter-type:

from wordpress import API
base_url = "http://books.home.local/omniana"
api_path = "/wp-json/pressbooks/v2/"
wpapi = API(
    url=base_url,
    consumer_key="thisismykey",
    consumer_secret="thisismysecret",
    api="wp-json",
    version="pressbooks/v2",
    wp_user="phil",
    wp_pass="thisismypassword",
    oauth1a_3leg=True,
    creds_store="~/.wc-api-creds3.json",
    callback="http://books.home.local/omniana/api-test"
)
print("creating new chapter")
resource = "chapters"
data = {
"content": "test",
"title": "test",
"status": "publish",
"chapter-type": 48,
"part": 27
}
try:
response = wpapi.post(base_url+api_path+resource, data)
pprint(response.json())
except Exception as e:
print("couldn't post")
print(e)

Finding the ids for chapter-type and part need a little detective work. You can, of course use an API call to GET the parts and  list their names and ids, in a similar way to listing the posts in the first example above; or you can just edit the part or chapter-type in the Bookpress admin interface and inspect the url. It’s also worth noting that you need a different creds_store for each OAUTH provider you connect to.

Next Steps

As I said, parsing reading through and parsing the transcluded the page transcriptions wasn’t too hard (I put some markers in the transclusion to help). I made some changes to the content before posting it: perhaps the most interesting issue was  changing the wiki style footnotes to Pressbook style.

At the time of writing, I have started posting to the live/public instance of Omniana on Pressbooks but still have to sort some formatting issues: removing line breaks, making sure that the CSS selectors are appropriate for WordPress; that shouldn’t take long to fix.

Then I want to start indexing the articles using wikidata for linked data.