4.8 Language Declaration Issues
Making Ebooks with InDesign: Module 4, Step 8
Making Ebooks with InDesign: Module 4, Step 7
InDesign to EPUB
Before reading this, you might want to read:
Step two of the navigation cleanup is to build a pagelist. This can be an onerous process but it is such a key piece of the accessibility puzzle that it’s worth it. You can’t achieve certification for your workflow or for individual titles without a pagelist in place. We’ve done the work to lay down the foundations of a pagelist in InDesign using the PageStaker script. Now let’s take it a step further and actually build it.
Make sure that you’ve done the step of removing character overrides from the HTML. Skipping that step will mean some of what I am going to show you now won’t work.
Building the pagelist is a two-step process: converting the page stakes to the proper page break markup, and then assembling the pagelist in the toc.hxtml. I will demo this process and then give you a quick trick to get it done a little more easily.
Step 1. The page stakes in InDesign look something like this at present.
In order to capture all possible page numbers, you can use a RegEx wildcard to search for any string of numbers and replace it with itself as the string transforms. So: search for any number of digits by using backslash-d, one or more times with the plus sign, and then a request that the string not be greedy by using a question mark. Wrapping that string in brackets means that we can reference it in the replace string with a dollar sign and 1. The numeral 1 refers to the first of bracketed string. You can have more than one in a RegEx search.
<span epub:type=”pagebreak” role=”doc-pagebreak” id=”page$1″ title=”$1″ aria-label=” Page $1\. ” />
If you do this globally by running it on all the HTML files in whatever software you are using, it will transform those page stakes in the appropriate markup.
One note: this find/replace won’t grab roman numerals.
Step 2: assemble a page list in the navigation document. This part is, well, no point sugar coating it. It’s just not fun. I use a kind of pagelist outline that I modify for every book I work on. This XML file is a downloadable for you. I have the correct outline:
<nav epub:type=”page-list” role=”doc-pagelist”>
This is a <nav> and a list. Each page listing has its own list item pointing to the correct HTML file and then the location marked with hash symbol and page number.
The trick now is to modify the HTML path to the correct file names. This is the tedious part. If you’ve broken your ebook into 12 chapters plus front and back matter, you need to go through all those files, figure out which page numbers are in them, and then modify the pagelist in the HTML. It’s yucky work.
I have a workaround that is still a little tedious but a lot less so than scraping through a long series of HTML files. When I export from InDesign, I don’t break the book into chapter-sized chunks. I keep it all in one long HTML file. This makes global searches a little easier since everything is in one file. My workflow is this: I do all the HTML cleanup, adjust the page-break tags, and assemble the pagelist. This last item is a little more straightforward because all of the page breaks are in one HTML file. So page i to page 344, for example, are all in “Book.xhtml”. When that work is assembled and the book passes EPUB Checker, I open it in Sigil and add the chapter breaks. Sigil is kind of brilliant as an ebook editor as it only does this one thing — that is, it is purpose-built for ebooks. It understands that inserting a page break means that the path to the new HTML file must be updated in the navigation, the OPF, and any other internal references such as footnotes.
This workflow works really nicely. Sigil will also let you right-click on the new Section0001.xthml file and update the name to something more meaningful like chapter1.xhtml.