Datasheet

21: public InputSource resolveEntity(String publicId,
22: String systemId)
23: throws SAXException, IOException {
24: if (systemId.equals(schemaURI)) {
25: FileReader r = new FileReader("book.xsd");
26: return new InputSource(r);
27: } else
28: return null;
29: }
30:
31: }
The general flow of a resolveEntity method is to look at the publicId and/or systemId arguments and
decide what you want to do. Once you’ve made your decision, your code then accesses the physical stor-
age (in this case, a file) and wraps it up in an InputSource for the rest of the parser to use. In this exam-
ple, you’re looking for the systemId of the book schema (which is the URI supplied in the
xsi:schemaLocation hint). If the entity being resolved is the book schema, then you read the schema from
a local copy, wrap the resulting FileReader in an InputSource, and hand it back.
You could do a variety of things in your resolveEntity method. Instead of storing entities in the local file
system, you could store them in a database and use JDBC to retrieve them. You could store them in a
content management system or an LDAP directory, as well. If you were reading a lot of large text entities
over and over again, you could build a cache inside your entity resolver so the entities were read only
once and after that were read from the cache.
Remember, though, at this level you’re dealing with caching the physical storage structures, not logical
structures they might contain. Even if you use the EntityResolver mechanism in preference to Xerces’
xsi:schemaLocation overrides, you still aren’t getting as much bang for your buck as if you use the gram-
mar-caching mechanism. At entity-resolver time, you’re caching the physical storage and saving physi-
cal retrieval costs. At grammar-caching time, you’re saving the cost of converting from a physical to a
logical representation. If you’re going to do logical caching of grammars, it doesn’t make much sense to
do physical caching of the grammar files. There are plenty of non-grammar uses of entities, and these are
all fair game for speedups via the entity resolver mechanism.
Entity References
In most cases, entities should be invisible to your application—it doesn’t matter whether the content in a
particular section of an XML document came from the main document entity, an internal entity, or an
entity stored in a separate file. Sometimes your application does want to know, particularly if your
application is something like an XML editor, which is trying to preserve the input document as much as
possible.
SAX provides the org.xml.sax.ext.LexicalHandler extension interface, which you can use to get callbacks
about events you don’t get via the ContentHandler callbacks. Among these callbacks are startEntity and
endEntity, which are called at the start and end of any entity (internal or external) in the document.
Ordinarily, startEntity and endEntity only report general entities and parameter entities (SAX says a
parser doesn’t have to report parameter entities, but Xerces does). Sometimes you’d like to know other
details about the exact physical representation of a document, such as whether one of the built-in entities
(&, >, <, ", or ') was used, or whether a character reference (&#XXXX) was used.
31
Xerces
01 543555 Ch01.qxd 11/5/03 9:40 AM Page 31