Datasheet

process of defining XML 1.1. When they have finished their work, you will be able to supply 1.1 in addi-

tion to 1.0 for the version number. If there is no encoding declaration, then the document must be

encoded using UTF-8. If you forget to specify an encoding declaration or specify an incorrect encoding

declaration, your XML parser will report a fatal error. We’ll have more to say about fatal errors later in

the chapter.

Well-Formedness

The rest of the file consists of data that has been marked up with tags (such as <title> and <author>).

The first rule or prerequisite for an XML document is that it must be well-formed. (An XML parser is

required by the XML specification to report a fatal error if a document isn’t well-formed.) This means

every start tag (like <book>) must have an end tag (</book>). The start and end tag, along with the data

in between them, is called an element. Elements may not overlap; they must be nested within each other.

In other words, the start and end tag of an element must be inside the start and end tag of any element

that encloses it. The data between the start and end tag is also known as the content of the element; it

may contain elements, characters, or a mix of elements and characters. Note that the start tag of an ele-

ment may contain attributes. In our example, the book element contains an xsi:schemaLocation attribute

in lines 4-5. The value of an attribute must be enclosed in either single quotes (') or double quotes ("). The

type of the end quote must match the type of the beginning quote.

Namespaces

In lines 2-4 you see a number of namespace declarations. The first declaration in line 2 sets the default

namespace for this document to http://sauria.com/schemas/apache-xml-book/book. Namespaces are

used to prevent name clashes between elements from two different grammars. You can easily imagine

the element name title or author being used in another XML grammar, say one for music CDs. If you

want to combine elements from those two grammars, you will run into problems trying to determine

whether a title element is from the book grammar or the CD grammar.

Namespaces solve that problem by allowing you to associate each element in a grammar with a names-

pace. The namespace is specified by a URI, which is used to provide a unique name for the namespace.

You can’t expect to be able to retrieve anything from the namespace URI. When you’re using name-

spaces, it’s as if each element or attribute name is prefixed by the namespace URI. This is very cumber-

some, so the XML Namespaces specification provides two kinds of shorthand. The first shorthand is the

ability to specify the default namespace for a document, as in line 2. The other shorthand is the ability to

declare an abbreviation that can be used in the document instead of the namespace URI. This abbrevia-

tion is called the namespace prefix. In line 3, the document declares a namespace prefix xsi for the name-

space associated with http://www.w3.org/2001/XMLSchema-instance. You just place a colon and the

desired prefix after xmlns.

Line 4 shows how namespace prefixes are used. The attribute schemaLocation is prefixed by xsi, and the

two are separated by a colon. The combined name xsi:schemaLocation is called a qualified name

(QName). The prefix is xsi, and the schemaLocation portion is also referred to as the local part of the

QName. (It’s important to know what all these parts are called because the XML parser APIs let you

access each piece from your program.)

Default namespaces have a lot of gotchas. One tricky thing to remember is that if you use a default

namespace, it only works for elements—you must prefix any attributes that are supposed to be in the

default namespace. Another tricky thing about default namespaces is that you have to explicitly define a

Xerces

01 543555 Ch01.qxd 11/5/03 9:40 AM Page 3