Datasheet

7
Chapter 1 An Eagle’s Eye View of XML
Self-describing data
Much computer data from the last 40 years is lost, not because of natural disaster or
decaying backup media (though those are problems too, ones XML doesn’t solve),
but simply because no one bothered to document how the data formats. A Lotus 1-2-3
file on a 15-year-old 5.25-inch floppy disk might be irretrievable in most corporations
today without a huge investment of time and resources. Data in a less-known binary
format such as Lotus Jazz may be gone forever.
XML is, at a low level, an incredibly simple data format. It can be written in 100 per-
cent pure ASCII or Unicode text, as well as in a few other well-defined formats. Text
is reasonably resistant to corruption. The removal of bytes or even large sequences
of bytes does not noticeably corrupt the remaining text. This starkly contrasts with
many other formats, such as compressed data or serialized Java objects, in which
the corruption or loss of even a single byte can render the rest of the file unreadable.
At a higher level, XML is self-describing. Suppose you’re an information archaeologist
in the twenty-third century and you encounter this chunk of XML code on an old
floppy disk that has survived the ravages of time:
<PERSON ID=”p1100” SEX=”M”>
<NAME>
<GIVEN>Judson</GIVEN>
<SURNAME> McDaniel</SURNAME>
</NAME>
<BIRTH>
<DATE>21 Feb 1834</DATE> </BIRTH>
<DEATH>
<DATE>9 Dec 1905</DATE> </DEATH>
</PERSON>
Even if you’re not familiar with XML, assuming you speak a reasonable facsimile of
twentieth-century English, you’ve got a pretty good idea that this fragment describes
a man named Judson McDaniel, who was born on February 21, 1834 and died on
December 9, 1905. In fact, even with gaps in or corruption of the data, you could
probably still extract most of this information. The same could not be said for a
proprietary, binary spreadsheet or word-processor format.
Furthermore, XML is very well documented. The World Wide Web Consortium
(W3C)’s XML specification and numerous books tell you exactly how to read XML
data. There are no secrets waiting to trip the unwary.
03 549863 Ch01.qxd 1/28/04 9:46 AM Page 7