Datasheet

21: public InputSource resolveEntity(String publicId,

22: String systemId)

23: throws SAXException, IOException {

24: if (systemId.equals(schemaURI)) {

25: FileReader r = new FileReader("book.xsd");

26: return new InputSource(r);

27: } else

28: return null;

29: }

30:

31: }

The general flow of a resolveEntity method is to look at the publicId and/or systemId arguments and

decide what you want to do. Once you’ve made your decision, your code then accesses the physical stor-

age (in this case, a file) and wraps it up in an InputSource for the rest of the parser to use. In this exam-

ple, you’re looking for the systemId of the book schema (which is the URI supplied in the

xsi:schemaLocation hint). If the entity being resolved is the book schema, then you read the schema from

a local copy, wrap the resulting FileReader in an InputSource, and hand it back.

You could do a variety of things in your resolveEntity method. Instead of storing entities in the local file

system, you could store them in a database and use JDBC to retrieve them. You could store them in a

content management system or an LDAP directory, as well. If you were reading a lot of large text entities

over and over again, you could build a cache inside your entity resolver so the entities were read only

once and after that were read from the cache.

Remember, though, at this level you’re dealing with caching the physical storage structures, not logical

structures they might contain. Even if you use the EntityResolver mechanism in preference to Xerces’

xsi:schemaLocation overrides, you still aren’t getting as much bang for your buck as if you use the gram-

mar-caching mechanism. At entity-resolver time, you’re caching the physical storage and saving physi-

cal retrieval costs. At grammar-caching time, you’re saving the cost of converting from a physical to a

logical representation. If you’re going to do logical caching of grammars, it doesn’t make much sense to

do physical caching of the grammar files. There are plenty of non-grammar uses of entities, and these are

all fair game for speedups via the entity resolver mechanism.

Entity References

In most cases, entities should be invisible to your application—it doesn’t matter whether the content in a

particular section of an XML document came from the main document entity, an internal entity, or an

entity stored in a separate file. Sometimes your application does want to know, particularly if your

application is something like an XML editor, which is trying to preserve the input document as much as

possible.

SAX provides the org.xml.sax.ext.LexicalHandler extension interface, which you can use to get callbacks

about events you don’t get via the ContentHandler callbacks. Among these callbacks are startEntity and

endEntity, which are called at the start and end of any entity (internal or external) in the document.

Ordinarily, startEntity and endEntity only report general entities and parameter entities (SAX says a

parser doesn’t have to report parameter entities, but Xerces does). Sometimes you’d like to know other

details about the exact physical representation of a document, such as whether one of the built-in entities

(&, >, <, ", or ') was used, or whether a character reference (&#XXXX) was used.

Xerces

01 543555 Ch01.qxd 11/5/03 9:40 AM Page 31