Datasheet
Xerces uses the SAX ErrorHandler interface to handle errors while parsing using the DOM API. You can
register your own ErrorHandler and customize your error reporting, just as with SAX. However, you
may want to access the DOM node that was under construction when the error condition occurred. To
do this, you can use the http://apache.org/xml/properties/dom/current-element-node to read the
DOM node that was being constructed at the time the parser signaled an error.
Other Features and Properties
Xerces uses an input buffer that defaults to 2KB in size. The size of this buffer is controlled by the prop-
erty http://apache.org/xml/properties/input-buffer-size. If you know you’ll be dealing with files
within a certain size range, it can help performance to set the buffer size close to the size of the files
you’re working with. The buffer size should be a multiple of 1KB. The largest value you should set this
property to is 16KB.
Xerces normally operates in a mode that makes it more convenient for users of Windows operating sys-
tems to specify filenames. In this mode, Xerces allows URIs (Uniform Resource Identifiers) to include file
specifications that include backslashes (\) as separators, and allows the use of DOS drive letters and
Windows UNC filenames. Although this is convenient, it can lead to sloppiness, because document
authors may include these file specifications in XML documents and DTDs. The
http://apache.org/xml/features/standard-uri-conformant feature turns off this convenience mode and
requires that all URIs actually be URIs.
The XML 1.0 specification recommends that the character encoding of an XML file should be specified
using a character set name specified by the Internet Assigned Numbers Authority (IANA). However,
this isn’t required. The feature http://apache.org/xml/features/allow-java-encodings allows you to use
the Java names for character encodings to specify the character set encoding for a document. This feature
can be convenient for an all-Java system, but it’s completely non-interoperable with non-Java based
XML parsers.
Turning on the feature http://apache.org/xml/features/disallow-doctype-decl causes Xerces to throw
an exception when a DTD is provided with an XML document. It’s possible to launch a denial-of-service
attack against an XML parser by providing a DTD that contains a recursively expanding entity defini-
tion, and eventually the entity expansion overflows some buffer in the parser or causes the parser to
consume all available memory. This feature can be used to prevent this attack. Of course, DTD validation
can’t be used when this flag is turned on, and Xerces is operating in a mode that isn’t completely compli-
ant with the XML specification.
Unfortunately, there are other ways to launch denial-of-service attacks against XML parsers, so the
Xerces team has created a SecurityManager class that is part of the org.apache.xerces.util package. The
current security manager can be accessed via the http://apache.org/xml/properties/security-manager
property. It lets you replace the security manager with your own by setting the value of the property to
an instance of SecurityManager. At the time of this writing, SecurityManager provides two JavaBean
properties, entityExpansionLimit and maxOccurNodeLimit Setting entityExpansionLimit is another way
to prevent the entity expansion attack. The value of this property is the number of entity expansions the
parser should allow in a single document. The default value for entityExpansionLimit is 100,000. The
maxOccurNodeLimit property controls the maximum number of occur nodes that can be created for an
XML Schema maxOccurs. This is for the case where maxOccurs is a number, not unbounded. The default
value for this property is 3,000.
19
Xerces
01 543555 Ch01.qxd 11/5/03 9:40 AM Page 19