Datasheet

Deferred DOM
One of the primary difficulties with using the DOM API is performance. This issue manifests itself in a
number of ways. The DOM’s representation of an XML document is very detailed and involves a lot of
objects. This has a big impact on performance because of the time it takes to create all those objects, and
because of the amount of memory those objects use. Developers are often surprised to see how much
memory an XML document consumes when it’s represented as a DOM tree.
To reduce the overhead of using the DOM in an application, the Xerces developers implemented what is
called deferred node expansion. This is an application of lazy evaluation techniques to the creation of DOM
trees. When deferred node expansion is turned on, Xerces doesn’t create objects to represent the various
parts of an XML document. Instead, it builds a non-object oriented set of data structures that contain the
information needed to create the various types of DOM nodes required by the DOM specification. This
allows Xerces to complete parsing in a much shorter time than when deferred node expansion is turned
off. Because almost no objects are created, the memory used is a fraction of what would ordinarily be
used by a DOM tree.
The magic starts when your application calls the appropriate method to get the DOM Document node.
Deferred node expansion defers the creation of DOM node objects until your program needs them. The
way it does so is simple: If your program calls a DOM method that accesses a node in the DOM tree, the
deferred DOM implementation creates the DOM node you’re requesting and all of its children.
Obviously, the deferred DOM implementation won’t create a node if it already exists. A finite amount of
work is done on each access to an unexpanded node.
The deferred DOM is especially useful in situations where you’re not going to access every part of a doc-
ument. Because it only expands those nodes (and the fringe defined by their children) that you access,
Xerces doesn’t create all the objects the DOM specification says should be created. This is fine, because
you don’t need the nodes you didn’t access. The result is a savings of memory and processor time (spent
creating objects and allocating memory).
If your application is doing complete traversals of the entire DOM tree, then you’re better off not using
the deferred DOM, because you’ll pay the cost of creating the non-object-oriented data structures plus
the cost of creating the DOM objects as you access them. This results in using more memory and proces-
sor time than necessary.
The deferred DOM implementation is used by default. If you wish to turn it off, you can set the feature
http://apache.org/xml/features/dom/defer-node-expansion to false. If you’re using the JAXP
DocumentBuilder API to get a DOM parser, then the deferred DOM is turned off.
Schema Handling
Xerces provides a number of features that control various aspects of validation when you’re using XML
Schema. The most important feature turns on schema validation: http://apache.org/xml/features
/validation/schema. To use it, the SAX name-spaces property (http://xml.org/sax/features
/namespaces) must be on (it is by default). The Xerces validator won’t report schema validation errors
unless the regular SAX validation feature (http://xml.org/sax/features/validation) is turned on, so you
must make sure that both the schema validation feature and the SAX validation feature are set to true.
20
Chapter 1
01 543555 Ch01.qxd 11/5/03 9:40 AM Page 20