Datasheet

used a different or buggy version of the schema you’re using. Worse, the author of the incoming
document may intentionally specify a different version of the schema in an attempt to subvert your
application.
The second reason you may choose to ignore these hints is that you might want to provide a local copy
of the schema so the validator doesn’t have to perform a network fetch of the schema document every
time it has to validate a document. If you’re in a server environment processing thousands or even mil-
lions of documents per day, the last thing you want is for the Xerces validator to be doing an HTTP
request to a machine somewhere on the Internet for each document it has to validate. Not only is this
terrible for performance, but it makes your application susceptible to a failure of the machine hosting the
schema. Fortunately, Xerces has a pair of properties you can use to override the schemaLocation hints.
The first property is http://apache.org/xml/properties/schema/external-schemaLocation; it overrides
the xsi:schemaLocation attribute. The value of the property is a string that has the same format as the
xsi:schemaLocation attribute: a set of pairs of namespace URIs and schema document URIs. The other
property is http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation; it han-
dles the xsi:noNamespaceSchemaLocation case. Its value has the same format as
xsi:noNamespaceSchemaLocation, a single URI with the location of the schema document.
Grammar Caching
If you’re processing a large number of XML documents that use a single DTD, a single XML schema,
or a small number of XML schemas, you should use the grammar-caching functionality built in to
Xerces. You can use the http://apache.org/xml/properties/schema/external-schemaLocation or
http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation properties to force
Xerces to read XML schemas from a local copy, which improves the efficiency of your application.
However, these properties work at an entity level (in a later section, you’ll discover that you could use
entity-handling techniques to accomplish what these two properties do).
Even if you’re reading the grammar from a local file, Xerces still has to read the grammar file and turn it
into data structures that can be used to validate an XML document, a process somewhat akin to compila-
tion. This process is very costly. If your application uses a single grammar or a small fixed number of
grammars, you would like to avoid the overhead of processing the grammar multiple times. That’s the
purpose of the Xerces grammar-caching functionality.
Xerces provide two styles of grammar caching: passive caching and active caching. Passive caching requires
little work on the part of your application. You set a property, and Xerces starts caching grammars.
When Xerces encounters a grammar that it hasn’t seen before, it processes the grammar and then caches
the grammar data structures for reuse. The next time Xerces encounters a reference to this grammar, it
uses the cached data structures.
Here’s a version of the book-processing program that uses passive grammar caching:
1: /*
2: *
3: * PassiveSchemaCache.java
4: *
5: * Example from "Professional XML Development with Apache Tools"
6: *
7: */
8: package com.sauria.apachexml.ch1;
23
Xerces
01 543555 Ch01.qxd 11/5/03 9:40 AM Page 23