Datasheet

document. This saves the overhead of creating all the internal data structures for each document. When
you combine this with grammar caching, you can get some nice improvements in performance relative
to creating a parser instance over and over again.
Common Problems
This section addresses some common problems that people encounter when they use Xerces. Most of
these issues aren’t Xerces specific, but they happen so frequently that we wanted to address them.
Classpath problems—It’s a simple mistake but a surprisingly common one. Both xml-apis.jar
and xercesImpl.jar must be on your classpath in order to use Xerces. Leaving one of them out
will cause pain and suffering. If you want to use the samples, you need to include
xercesSamples.jar on your classpath.
The other thing to beware of is strange interactions between your classpath and either the JDK
1.3 Extension Mechanism or the JDK 1.4 Endorsed Standards Override Mechanism. If it looks
like you aren’t getting Xerces or the Xerces version that you think you’re using, look for old ver-
sions of Xerces in these places. You can determine the version of Xerces by executing the follow-
ing at your command line:
java org.apache.xerces.impl.Version
This command prints out the version of Xerces you’re using. You can also call the static method
org.apache.xerces.impl.Version#getVersion from inside a program to get the version string.
Errors not reported or always reported to the console—If you don’t provide an ErrorHandler,
one of two behaviors will occur. In every version of Xerces prior to 2.3.0, if no ErrorHandler is
registered, no error messages are displayed. You must register your own ErrorHandler if you
want error messages to be reported. This problem confused a lot of people, so in version 2.3.0
the behavior was changed so that error messages are echoed to the console when no
ErrorHandler is registered. In these versions of Xerces, you need to register your own
ErrorHandler to turn off the messages to the console.
Multiple calls to characters—In SAX applications, it’s common to forget that the characters call-
back may be called more than once for the character data inside an element. Unless you buffer
up the text by, say, appending it to a StringBuffer, it may look like your application is randomly
throwing away pieces of character data.
When is ignorableWhitespace called?—It’s not enough that the definition of ignorable whites-
pace is confusing to people. The ignorableWhitespace callback is called for ignorableWhitespace
only when a DTD is associated with the document. If there’s no DTD, ignorableWhitespace isn’t
called. This is true even if there is an XML schema but no DTD.
Forgot validation switches—Another common problem is forgetting to turn on the validation
features. This is true both for DTD validation and for schema validation. A single feature must
be turned on for DTD validation; but for schema validation you must have namespace support
turned on in addition to the feature for schema validation. That’s three properties. Make sure
you have them all on.
Multiple documents in one file—People like to try to put multiple XML documents into a sin-
gle file. This isn’t legal XML, and Xerces won’t swallow it. You’ll definitely see errors for that.
49
Xerces
01 543555 Ch01.qxd 11/5/03 9:40 AM Page 49