Soltys Family Home | Internet Resources For Technical Communicators | Articles | Core Dump

X Marks the Spot

By Michael Skeet

Note: This article originally appeared in the June 2001 issue of Communication Times published by the Toronto Chapter of the STC.

Everyone seems to be talking about XML these days. Look at the shelves of the computer-books section of your typical Chapdigo store: XML books are almost as ubiquitous as those "Chicken Soup for the <fill in the blank> Soul" self-help tomes.

Listen, though, and look more closely. All the talk and the books seem to be about how XML will make the Web a better place; how it will make data interchange more efficient, and make databases obsolete and slice and dice and make julienne fries and even cure the Heartbreak of Psoriasis.

And some of those claims are actually true. (No points for guessing which ones.) The problem is that very little of the noise about XML concerns its most important use. Well, its most important use insofar as we're concerned anyway.

XML was developed to make documentation easier, and that's still one of the coolest things about the technology. This article is an extended reminder of that fact, and a suggestion about how documentation departments and even individual technical writers can adopt and utilize XML to make their work-lives more productive.

To begin with, a bit of background. XML is only a little over five years old, so it's understandable if it isn't yet on a lot of peoples' radar. Extensible Markup Language is a subset of SGML, the markup language developed in the 1980s. XML was created as a deliberate attempt to do more than HTML, and less than SGML. As such, it offers much more of SGML's functionality, while avoiding a lot of the complexity that gives SGML such a steep learning curve.

Unlike HTML, which in its current incarnation is pretty much completely about describing how documents look, XML is a content-oriented markup language. XML documents describe what their contents are, not how they look.

On the surface, XML appears to be much like HTML: content is surrounded by markup tags, each of which represents an element (and possibly contains attributes which further describe the content). HTML elements, though, don't tell us anything about what's surrounded by those elements.

XML elements, in contrast, tell you exactly what you're looking at. A chapter of a book is contained within <chapter> tags. The name of a GUI menu is (or ought to be) contained within <guimenu> tags. And so on.

Even better, XML is, as the first word of the acronym points out, extensible. Unlike HTML, which is a fixed set of elements, XML allows you to invent your own elements. Accountants can create documents which utilize exciting accountancy tags to identify important bits of content. Chemical documents can contain elements which let us know that the content is a <formula> or a <catalyst>.

So how does this make my life as a technical writer any easier? I'm glad you asked.

Once again, a bit of background. OpenCola, my current employer, is a fully fledged XML shop: All of our published (and a lot of our internal) documentation originates as XML. We use the DocBook Document Type Definition (DTD) as our template, thereby instantly giving our documents a structural commonality with those of hundreds or thousands of other technical writers. Because we're an open source shop, we try to use open source tools where possible. So we produce output using the Xalan and Cocoon transformation engines, and store our files in a CVS repository. Our documentation team is not big, and we don't have huge resources to throw at resolving technical issues. The implication, of course, is that if we can do it, just about anyone can do it.

Back, then, to the question at hand: how does XML help us?

The most important way that XML helps a writer is in allowing for single sourcing of documentation. Unlike in previous jobs, where I had to write separate versions of manuals for different user groups or different platforms, XML in my current job allows me to write material once, then reuse it as many times as I need.

At OpenCola, I write modular (I like to call them "exploded") documents. Rather than write a book, or even a chapter, I write in fragments-sections, even individual procedures. XML allows me the flexibility to save these fragments as individual files, or even as text entities. (If you've used HTML, you're familiar with entities. They're the bits of code that use the markup &lt; to represent the < character.) I can then utilize the fragments in books or articles, knowing that as long as I follow the rules (in my case, the rules imposed by the DocBook DTD), I can assemble valid XML documents out of collections of these document fragments.

Even better, by using the DocBook attribute "condition", I can identify fragments, from individual paragraphs within a procedure all the way up to entire chapters, in such a way as to allow them to be included or excluded from the finished document when it is published.

Publishing an XML document, incidentally, is a snap. Okay, that's not quite true. It's a snap for me today, because while I was busy writing, someone else was configuring Cocoon and the stylesheets that are needed to transform XML into something a little more user-friendly.

At OpenCola we publish to our website, to help files, and to paper. In order to make publishing easy, we have had to create or modify XSL stylesheets (XSL is the Extensible Stylesheet Language, one of many XML offshoots) that are in effect templates for matching XML elements onto HTML elements or LaTeX commands.
But we only had to create these stylesheets once. Now that they exist, I can publish a 100-page book in less than the time it took me to write this sentence. We're talking milliseconds here: I type a one-line command into the Windows or Linux command line, and Cocoon builds an HTML file, or a LaTeX file that I can instantly convert into a PDF document. In a couple of minutes I can take my single source XML documents and build a user guide for print, a separate user guide for posting to our web site, and a full set of topics for an HTML-based help file.

XML is also proving a boon in the somewhat esoteric area of document permanence. It's easy, in a period where timelines are ridiculously compressed and a veteran high-tech company is one that remembers the 'nineties, to ignore the importance of seeing that documents created today can still be read by tomorrow's users. Or even tomorrow's members of your company's documentation team. But this is the sort of requirement that will come back around and bite you in the butt if you don't think about it. It's also the sort of thing that XML allows you to stop worrying about.

I have personal documents, written only a decade ago, that I can no longer use because they were written on a now-obsolete and unsupported proprietary word processor, using an operating system that doesn't work on any computer I currently own.

This is a tragedy to me, personally. But imagine such a problem on a company-wide scale, over a time-frame measured in decades. Or imagine the expense involved in having to update everybody's software because a single user has switched to a later version. (Ever tried to open a FrameMaker 6 document using Frame 5.5?)
This problem simply doesn't exist for technical writers using XML. The current XML specification is version 1.0; this is likely to be the current XML spec ten years from now, or twenty. A document created with XML in 1998 is readable today by software that didn't exist then, and will be readable in 2008 by software that doesn't exist now. So long as you can get access to a text editor that handles the Unicode 16-bit character set, you can process any XML document written any time.

There is a learning curve involved in using XML; it would be foolish to deny it. But the curve isn't unforgivably steep, and XML skills are easy to learn for anyone with even a modicum of HTML experience. The skill sets required to maintain processing engines and stylesheets are a bit more daunting. But a documentation shop needs only one individual with such qualifications. And the payoff can be immense.

That payoff isn't just for companies using XML, either. Writers who work with XML get to spend a surprising amount of their time…writing. Because processing and publishing XML documents can be so easy, I'm required to devote almost none of my resources to working on appearances. The writing gets most of my time.
And that's how it ought to be.

[Gratuitous biographical note: Michael Dennis Skeet is a technical writer at OpenCola Inc., a highly carbonated Toronto software firm. He has been a working writer for longer than some of OpenCola's managers have been alive.]

Back to top

Soltys Family Home | Internet Resources For Technical Communicators | Articles | Core Dump