Datasheet

❑ Mismatched encoding declaration—The character encoding used in a file and the encoding

name specified in the encoding declaration must match. The encoding declaration is the encod-

ing="name" that appears after <? xml version="1.0" encoding="name"?> in an XML document.

If the encoding of the file and the declared encoding don’t match, you may see errors about

invalid characters.

❑ Forgetting to use namespace-aware methods—If you’re working with namespaces, be sure to

use the namespace-aware versions of the methods. With SAX this is fairly easy because most

people are using the SAX 2.0 ContentHandler, which has only the namespace-aware callback

methods. If you’re using DocumentHandler and trying to do namespaces, you’re in the wrong

place. You need to use ContentHandler. In DOM-based parsers, this is a little harder because

there are namespace-aware versions of methods that have the letters NS appended to their

names. So, Element#getAttributeNS is the namespace-aware version of the

Element#getAttribute method.

❑ Out of memory using the DOM—Depending on the document you’re working with, you may

see out-of-memory errors if you’re using the DOM. This happens because the DOM tends to be

very memory intensive. There are several possible solutions. You can increase the size of the

Java heap. You can use the DOM in deferred mode—if you’re using the JAXP interfaces, then

you aren’t using the DOM in deferred mode. Finally, you can try to prune some of the nodes in

the DOM tree by setting the feature http://apache.org/xml/features/dom/include-ignorable-

whitespace to false.

❑ Using appendChild instead of importNode across DOM trees—The Xerces DOM implementa-

tion tries to enforce some integrity constraints on the contents of the DOM. One common thing

developers want to do is create a new DOM tree and then copy some nodes from another DOM

tree into it. Usually they try to do this using Node#appendChild, and then they start seeing

exceptions like DOMException: DOM005 Wrong document, which is confusing. To copy nodes

between DOM trees you need to use the Document#importNode method, and then you can call

the method you want to put the node into its new home.

Applications

We’ve covered a lot of ground in this chapter, and yet we’ve hardly begun. XML parsing has so many

applications that it’s hard to show all the ways you might use it in your application. Here are a couple of

ideas.

One place you end up directly interacting with the XML parser is in the kind of example we’ve been

using through out this chapter: turning XML documents into domain-specific objects within your appli-

cation. Although there are some proposals for tools that can do it for you, this is a task where you’ll still

see developers having direct interaction with the parser, at least for a little while longer.

Another application people use the parser for directly is filtering XML. When you have a very large

XML document and you need only part of it, using SAX to cut out the stuff you don’t want to deal with

is a very viable solution.

Chapter 1

01 543555 Ch01.qxd 11/5/03 9:40 AM Page 50