Datasheet

that converts XNI events into the SAX events that Jing already understands. This wrapped version

of Jing is then inserted into the appropriate spot in the XNI pipeline within an XMLParserConfiguration

called JingConfiguration. For ease of use, Andy has again provided convenience classes that work

just like the Xerces SAX and DOM parser classes. For a Relax-NG aware SAX parser, use org.cyberneko

.relaxng.parsers.SAXParser; for a DOM parser, use org.cyberneko.relaxng.parsers.DOMParser. You must

set the SAX validation and namespace features to true. You must also set a property that tells the Relax-

NG validator where to find the Relax-NG schema to be used for validation, because Relax-NG doesn’t

specify a way of associating a schema with a document. This property is called http://cyberneko.org

/xml/properties/relaxng/schema-location, and its value should be the URI for the schema file.

NekoPull

The last CyberNeko tool is NekoPull, the CyberNeko pull parser. The commonly used APIs for XML,

SAX, and DOM are push APIs. Once your program asks the parser to parse a document, your applica-

tion doesn’t regain control until the parse completes. SAX calls your program code via its event call-

backs, but that’s about as good as it gets. With the DOM, you have to wait until the entire tree has been

built before you can do anything.

The difficulty with SAX is that for any non-trivial XML grammar, you end up maintaining a bunch of

stacks and a state machine that remembers where you are in the grammar at any point in the parse. It

also makes it very hard to modularize your application. If you have an XML grammar where the ele-

ments are turned into objects of various classes, you have to do a lot of work to keep the event-handling

code for each class associated with each class. You end up trying to create ContentHandlers that handle

only the section of the grammar for a particular class, and then you have to build infrastructure to multi-

plex between these ContentHandlers. It can be done, but the process is tedious and error prone.

With the DOM, you can create a constructor that knows how to construct an instance of your class from

an org.w3c.dom.Element node, and then you can pass the DOM tree around to instances of the various

classes. You can handle contained objects by passing the right element in the DOM tree to the construc-

tors for those contained object types. The disadvantage of the DOM is that you have to wait until the

whole document is processed, even if you only need part of it. And, of course, there’s the usual problem

of how much memory DOM trees take up.

Pull-parsing APIs can give you the best of both worlds. In a pull-parsing API, the application asks the

parser to parse the next unit in the XML document, regardless of whether that unit is an element, charac-

ter data, a processing instruction, and so on. This means you can process the document in a streaming

fashion, which is a benefit of SAX. You can also pass the parser instance around to your various object

constructors. Because the parser instance remembers where it is in the document, the constructor can call

the parser to ask for the next bits of XML, which should represent the data it needs to construct an

object. Contained objects are handled just like the DOM case; you pass the parser instance (which again

remembers its place) to the constructors for the contained objects. This is a much better API.

Let’s walk through a pull implementation of the Book object building program:

1: /*

2: *

3: * NekoPullMain.java

4: *

Xerces

01 543555 Ch01.qxd 11/5/03 9:40 AM Page 45