Standard Generalized Markup Language was what we first became familiar with in the late nineties in order to enable us to “repurpose” our texts. SGML is not a document language, but a description of how to specify one. In the old days typescript would be marked up by the designer or copyeditor, in pretty general terms, for example CT next to a chapter title. The typesetter would mark it up in more detail so that the keyboard operator could fly through it without having to stop and figure out what type size and face was really called for here in the designer’s specifications. Thus we were familiar with the need for markup. But what we were familiar with was markup directed at creating a book, laid out in pages, not markup which would enable the text to be output in multiple ways on different platforms, including as a book. Up until then we were book publishers, and we published books.

In the nineties along came the idea that the text of a book (the content) might in fact need to be used in other ways, and the fact that we already had that content on a digital medium made it obvious that money could be saved by using the same digital storage for all and any reuses. The main difficulty here was the difficulty of getting people’s minds changed so they could countenance the idea. Prior to the invention of computer systems the different ways a book might be used amounted to a paperback edition or a hardback, with the occasional opportunity for an extract to be published in some periodical. Magazines would just reset the extract they were doing, and while we academic publishers would use the same typesetting for hardback and paperback, even if a mass market paperback required resetting, the cost of typesetting was fairly trivial when compared to paper, presswork and binding a huge number of copies being printed. Now we also had the opportunity to allow people to access our content online: this required a severe adjustment of focus.

SGML is ancestral kin to XML (Extensible Markup Language) and HTML (HyperText Markup Language) which are now the primary tools used for text markup. The theoretical background to all markup languages is that before it ever appears to the world the text of any work should have been described in such a non-specific way that any application in whatever form you can imagine can be run off on a computer without any intervention beyond the specification of what medium you are targeting. In the example below you can see the HTML codes enclosed in guillemets < >. Here <h1> denotes a first level heading, <p> a new paragraph, <i> italic, and <em> tells you that this is an emphasized word. (The green color is just there for pedagogical purposes. Markup doesn’t show in green in the real world.)

Given that when you use one of these meta-languages to describe your document you have in theory prepared it for any and all applications, it may be seen as perverse not to use the markup to facilitate certain outcomes. In order to make our ebooks fully accessible to print-disabled people, here’s Bill Kasdorf in Publishers Weekly encouraging us to take advantage of the powers provided by our HTML mark-up. This additional small step is pretty straight-forward if you are doing your markup thoroughly — and if you’re not, why bother?

Almost parenthetically I might note that the transition to digital text processing and SGML markup, like all changes, caused a good deal of low-level turmoil. Once people got on board and accepted that text markup “was a good thing” a kind of enthusiasm gripped those bosses with more power than knowledge. Why couldn’t we take all those digital resources which we’d been holding onto for a few years and magically get them SGML-ed? Well, I can’t imagine that at the end of the last century the digital storage system we had was much different from that at any other book publisher. It consisted of a cardboard box or two into which the disks of any book that had had disks were tossed. Rubber banding together the disks from a particular book was a good idea, but rubber bands give up the struggle after a couple of years. What you had therefore was a mess of disks of various sorts, sizes, and formats, some of which were unreadable because the machines they drove no longer existed, some of which had gotten one of their component disks lost, and all of which required time to assess. Publishers will staff their production departments on the basis of the volume of work going through at any time. The amount of work going though was calculated on the basis of the number of books due to be published in the next 12 months — not with regard to sorting out the disks for every book you’d published over the previous five or so years. Eventually, I suspect, all publishers either threw away their old disks (and tapes), or sent them off to an overseas supplier to sort out, but we all spent a considerable amount of time trying to solve the problem of “looking back”. It’s always easier to implement a new system going forward: you just start doing your new books in the new way. Trying to catch up with the old books which were done differently is a nightmare. (This of course is why lots of older books remain unavailable as ebooks.)