Datasheet

used a different or buggy version of the schema you’re using. Worse, the author of the incoming

document may intentionally specify a different version of the schema in an attempt to subvert your

application.

The second reason you may choose to ignore these hints is that you might want to provide a local copy

of the schema so the validator doesn’t have to perform a network fetch of the schema document every

time it has to validate a document. If you’re in a server environment processing thousands or even mil-

lions of documents per day, the last thing you want is for the Xerces validator to be doing an HTTP

request to a machine somewhere on the Internet for each document it has to validate. Not only is this

terrible for performance, but it makes your application susceptible to a failure of the machine hosting the

schema. Fortunately, Xerces has a pair of properties you can use to override the schemaLocation hints.

The first property is http://apache.org/xml/properties/schema/external-schemaLocation; it overrides

the xsi:schemaLocation attribute. The value of the property is a string that has the same format as the

xsi:schemaLocation attribute: a set of pairs of namespace URIs and schema document URIs. The other

property is http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation; it han-

dles the xsi:noNamespaceSchemaLocation case. Its value has the same format as

xsi:noNamespaceSchemaLocation, a single URI with the location of the schema document.

Grammar Caching

If you’re processing a large number of XML documents that use a single DTD, a single XML schema,

or a small number of XML schemas, you should use the grammar-caching functionality built in to

Xerces. You can use the http://apache.org/xml/properties/schema/external-schemaLocation or

http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation properties to force

Xerces to read XML schemas from a local copy, which improves the efficiency of your application.

However, these properties work at an entity level (in a later section, you’ll discover that you could use

entity-handling techniques to accomplish what these two properties do).

Even if you’re reading the grammar from a local file, Xerces still has to read the grammar file and turn it

into data structures that can be used to validate an XML document, a process somewhat akin to compila-

tion. This process is very costly. If your application uses a single grammar or a small fixed number of

grammars, you would like to avoid the overhead of processing the grammar multiple times. That’s the

purpose of the Xerces grammar-caching functionality.

Xerces provide two styles of grammar caching: passive caching and active caching. Passive caching requires

little work on the part of your application. You set a property, and Xerces starts caching grammars.

When Xerces encounters a grammar that it hasn’t seen before, it processes the grammar and then caches

the grammar data structures for reuse. The next time Xerces encounters a reference to this grammar, it

uses the cached data structures.

Here’s a version of the book-processing program that uses passive grammar caching:

1: /*

2: *

3: * PassiveSchemaCache.java

4: *

5: * Example from "Professional XML Development with Apache Tools"

6: *

7: */

8: package com.sauria.apachexml.ch1;

Xerces

01 543555 Ch01.qxd 11/5/03 9:40 AM Page 23