Datasheet

If you’re working with SAX, the first place to go is to the SAX Counter sample. This sample parses your

document and prints some statistics based on what it finds. To invoke Counter, type

java sax.Counter <options> <filename>

There are command-line options to turn on and off namespace processing, validation, and schema vali-

dation, and to turn on full checking of the schema document. If you omit the options and filename,

you’ll get a help screen describing all the options. The key reason to start with sax.Count is that if Xerces

is throwing an exception, it will probably throw that exception when you run sax.Count. From there,

you can try to figure out if the problem is with the XML file, your application, or Xerces (in which case

you should send mail to xerces-j-user@xml.apache.org with a bug report).

There’s a pair of DocumentTracer samples, one for SAX and one for XNI. These samples are in classes

named sax.DocumentTracer and xni.DocumentTracer, respectively. Their job is to print out all the SAX or

XNI callbacks as they are fired for your document. Occasionally these samples can be useful to help you

figure out which callbacks are being passed which data—especially when you’re tired and confused

after a long day of programming. They can also help you debug namespace-related problems, because

all the prefixes get expanded. The output of xni.DocumentTracer is more detailed and complete than

that of sax.DocumentTracer, due to the higher fidelity of the XNI callbacks, but most of the time you’ll

want to use sax.DocumentTracer so you can see exactly what SAX sees.

If you’re using the DOM, you can use the DOM Counter sample, which lives in dom.Counter. It does the

same thing as sax.Counter, but it uses the DOM and therefore will probably exercise some of the same

DOM code your application does.

CyberNeko Tools for XNI

Andy Clark is one of the Xerces committers and was the driving force behind the design of XNI. He’s

written a suite of tools called NekoXNI to showcase some of the things you can do with XNI. Even if you

aren’t interested in using XNI, you might want to have a look, because some of the tools are pretty use-

ful. In this section, we’ll look at a few of these tools.

NekoHTML

NekoHTML uses XNI to allow an application to process an HTML document as if it were an XML docu-

ment. There are both SAX and DOM parsers in the org.cyberneko.html.parsers package. You use

org.cyberneko.html.parsers.SAXParser just like the regular Xerces SAXParser; you can plug in your own

ContentHandlers and so on using the regular SAX API. The org.cyberneko.html.parsers.DOMParser

works like the Xerces DOMParser with one notable twist. Instead of using the Xerces XML DOM, it uses

the Xerces HTML DOM, which means you get a DOM implementation that is aware of some of the rules

of HTML. To use NekoHTML, you need to have nekohtml.jar in your classpath, in addition to the regu-

lar jars you need for Xerces. But if you need to process HTML, it’s worth it.

ManekiNeko

Another interesting and useful component of NekoXNI is a validator for Relax-NG called ManekiNeko.

This validator is based on James Clark’s Jing validator for Relax-NG, and it works by creating a wrapper

Chapter 1

01 543555 Ch01.qxd 11/5/03 9:40 AM Page 44