Datasheet
process of defining XML 1.1. When they have finished their work, you will be able to supply 1.1 in addi-
tion to 1.0 for the version number. If there is no encoding declaration, then the document must be
encoded using UTF-8. If you forget to specify an encoding declaration or specify an incorrect encoding
declaration, your XML parser will report a fatal error. We’ll have more to say about fatal errors later in
the chapter.
Well-Formedness
The rest of the file consists of data that has been marked up with tags (such as <title> and <author>).
The first rule or prerequisite for an XML document is that it must be well-formed. (An XML parser is
required by the XML specification to report a fatal error if a document isn’t well-formed.) This means
every start tag (like <book>) must have an end tag (</book>). The start and end tag, along with the data
in between them, is called an element. Elements may not overlap; they must be nested within each other.
In other words, the start and end tag of an element must be inside the start and end tag of any element
that encloses it. The data between the start and end tag is also known as the content of the element; it
may contain elements, characters, or a mix of elements and characters. Note that the start tag of an ele-
ment may contain attributes. In our example, the book element contains an xsi:schemaLocation attribute
in lines 4-5. The value of an attribute must be enclosed in either single quotes (') or double quotes ("). The
type of the end quote must match the type of the beginning quote.
Namespaces
In lines 2-4 you see a number of namespace declarations. The first declaration in line 2 sets the default
namespace for this document to http://sauria.com/schemas/apache-xml-book/book. Namespaces are
used to prevent name clashes between elements from two different grammars. You can easily imagine
the element name title or author being used in another XML grammar, say one for music CDs. If you
want to combine elements from those two grammars, you will run into problems trying to determine
whether a title element is from the book grammar or the CD grammar.
Namespaces solve that problem by allowing you to associate each element in a grammar with a names-
pace. The namespace is specified by a URI, which is used to provide a unique name for the namespace.
You can’t expect to be able to retrieve anything from the namespace URI. When you’re using name-
spaces, it’s as if each element or attribute name is prefixed by the namespace URI. This is very cumber-
some, so the XML Namespaces specification provides two kinds of shorthand. The first shorthand is the
ability to specify the default namespace for a document, as in line 2. The other shorthand is the ability to
declare an abbreviation that can be used in the document instead of the namespace URI. This abbrevia-
tion is called the namespace prefix. In line 3, the document declares a namespace prefix xsi for the name-
space associated with http://www.w3.org/2001/XMLSchema-instance. You just place a colon and the
desired prefix after xmlns.
Line 4 shows how namespace prefixes are used. The attribute schemaLocation is prefixed by xsi, and the
two are separated by a colon. The combined name xsi:schemaLocation is called a qualified name
(QName). The prefix is xsi, and the schemaLocation portion is also referred to as the local part of the
QName. (It’s important to know what all these parts are called because the XML parser APIs let you
access each piece from your program.)
Default namespaces have a lot of gotchas. One tricky thing to remember is that if you use a default
namespace, it only works for elements—you must prefix any attributes that are supposed to be in the
default namespace. Another tricky thing about default namespaces is that you have to explicitly define a
3
Xerces
01 543555 Ch01.qxd 11/5/03 9:40 AM Page 3