Datasheet

Document Scanner
The document scanner knows how to take an XML document and fire the callbacks for elements (and
attributes), characters, and anything else you might encounter in an XML document. This is the
workhorse component for any XNI application that is going to work with an XML document.
Applications that just work with the DTD or schema may end up not using this class. The document
scanner is implemented by the class org,apache.xerces.impl.XMLDocumentScannerImpl and uses the
URI http://apache.org/xml/properties/internal/document-scanner as its property ID. To use it, you
also need the DTD scanner, entity manager, error reporter, and symbol table.
DTD Scanner
If you’re processing DTDs, either directly or indirectly, you need the DTD scanner. It knows the syntax
of DTDs and fires XMLDTDHandler and XMLDTDContentModelHandler events as it processes the
DTD. The DTD scanner is implemented by the class org.apache.xerces.impl.XMLDTDScannerImpl and
uses the URI http://apache.org/xml/properties/internal/dtd-scanner as its property ID. To use it, you
also need the entity manager, error reporter, and symbol table.
DTD Validator
Scanning DTDs is different from validating with them. After the DTD pipeline has scanned the DTD and
assembled the necessary definitions, the document content pipeline needs to use those definitions to val-
idate the document. That’s where the DTD validator comes in. It takes the definitions created by the
DTD pipeline and uses them to validate the document. The validator is inserted into the pipeline as a fil-
ter, after the document scanner. The DTD validator is implemented by the class org.apache.xerces.
impl.dtd.XMLDTDValidator and uses the URI http://apache.org/xml/properties/internal
/validator/dtd as its property ID. To use it, you also need the entity manager, error reporter, and
symbol table.
Namespace Binder
The process of mapping namespace prefixes to namespace URIs is called namespace binding. It needs to
occur after DTD validation has occurred because the DTD may have provided default values for one
or more namespace attributes in the document. These namespace bindings are needed for schema
validation, so the namespace binder is inserted as a filter after the DTD validator and before the
schema validator. The namespace binder is implemented by the class org.apache.xerces.
impl.XMLNamespaceBinder and uses the URI http://apache.org/xml/properties/internal
/namespace-binder as its property ID. To use it, you also need the error reporter and the symbol table.
Schema Validator
The schema validator validates the document against an XML schema. It’s inserted into the pipeline as a
filter after the namespace binder. As it processes the document, it may augment the streaming informa-
tion set with default and normalized simple type values. It may also add items to the PSVI via the
augmentations. The schema validator is implemented by the class org.apache.xerces.impl.xs.
XMLSchemaValidator and uses the URI http://apache.org/xml/properties/internal/validator
/schema as its property ID. To use it, you also need the error reporter and the symbol table.
42
Chapter 1
01 543555 Ch01.qxd 11/5/03 9:40 AM Page 42