Datasheet

that converts XNI events into the SAX events that Jing already understands. This wrapped version
of Jing is then inserted into the appropriate spot in the XNI pipeline within an XMLParserConfiguration
called JingConfiguration. For ease of use, Andy has again provided convenience classes that work
just like the Xerces SAX and DOM parser classes. For a Relax-NG aware SAX parser, use org.cyberneko
.relaxng.parsers.SAXParser; for a DOM parser, use org.cyberneko.relaxng.parsers.DOMParser. You must
set the SAX validation and namespace features to true. You must also set a property that tells the Relax-
NG validator where to find the Relax-NG schema to be used for validation, because Relax-NG doesn’t
specify a way of associating a schema with a document. This property is called http://cyberneko.org
/xml/properties/relaxng/schema-location, and its value should be the URI for the schema file.
NekoPull
The last CyberNeko tool is NekoPull, the CyberNeko pull parser. The commonly used APIs for XML,
SAX, and DOM are push APIs. Once your program asks the parser to parse a document, your applica-
tion doesn’t regain control until the parse completes. SAX calls your program code via its event call-
backs, but that’s about as good as it gets. With the DOM, you have to wait until the entire tree has been
built before you can do anything.
The difficulty with SAX is that for any non-trivial XML grammar, you end up maintaining a bunch of
stacks and a state machine that remembers where you are in the grammar at any point in the parse. It
also makes it very hard to modularize your application. If you have an XML grammar where the ele-
ments are turned into objects of various classes, you have to do a lot of work to keep the event-handling
code for each class associated with each class. You end up trying to create ContentHandlers that handle
only the section of the grammar for a particular class, and then you have to build infrastructure to multi-
plex between these ContentHandlers. It can be done, but the process is tedious and error prone.
With the DOM, you can create a constructor that knows how to construct an instance of your class from
an org.w3c.dom.Element node, and then you can pass the DOM tree around to instances of the various
classes. You can handle contained objects by passing the right element in the DOM tree to the construc-
tors for those contained object types. The disadvantage of the DOM is that you have to wait until the
whole document is processed, even if you only need part of it. And, of course, there’s the usual problem
of how much memory DOM trees take up.
Pull-parsing APIs can give you the best of both worlds. In a pull-parsing API, the application asks the
parser to parse the next unit in the XML document, regardless of whether that unit is an element, charac-
ter data, a processing instruction, and so on. This means you can process the document in a streaming
fashion, which is a benefit of SAX. You can also pass the parser instance around to your various object
constructors. Because the parser instance remembers where it is in the document, the constructor can call
the parser to ask for the next bits of XML, which should represent the data it needs to construct an
object. Contained objects are handled just like the DOM case; you pass the parser instance (which again
remembers its place) to the constructors for the contained objects. This is a much better API.
Let’s walk through a pull implementation of the Book object building program:
1: /*
2: *
3: * NekoPullMain.java
4: *
45
Xerces
01 543555 Ch01.qxd 11/5/03 9:40 AM Page 45