Datasheet
If you’re working with SAX, the first place to go is to the SAX Counter sample. This sample parses your
document and prints some statistics based on what it finds. To invoke Counter, type
java sax.Counter <options> <filename>
There are command-line options to turn on and off namespace processing, validation, and schema vali-
dation, and to turn on full checking of the schema document. If you omit the options and filename,
you’ll get a help screen describing all the options. The key reason to start with sax.Count is that if Xerces
is throwing an exception, it will probably throw that exception when you run sax.Count. From there,
you can try to figure out if the problem is with the XML file, your application, or Xerces (in which case
you should send mail to xerces-j-user@xml.apache.org with a bug report).
There’s a pair of DocumentTracer samples, one for SAX and one for XNI. These samples are in classes
named sax.DocumentTracer and xni.DocumentTracer, respectively. Their job is to print out all the SAX or
XNI callbacks as they are fired for your document. Occasionally these samples can be useful to help you
figure out which callbacks are being passed which data—especially when you’re tired and confused
after a long day of programming. They can also help you debug namespace-related problems, because
all the prefixes get expanded. The output of xni.DocumentTracer is more detailed and complete than
that of sax.DocumentTracer, due to the higher fidelity of the XNI callbacks, but most of the time you’ll
want to use sax.DocumentTracer so you can see exactly what SAX sees.
If you’re using the DOM, you can use the DOM Counter sample, which lives in dom.Counter. It does the
same thing as sax.Counter, but it uses the DOM and therefore will probably exercise some of the same
DOM code your application does.
CyberNeko Tools for XNI
Andy Clark is one of the Xerces committers and was the driving force behind the design of XNI. He’s
written a suite of tools called NekoXNI to showcase some of the things you can do with XNI. Even if you
aren’t interested in using XNI, you might want to have a look, because some of the tools are pretty use-
ful. In this section, we’ll look at a few of these tools.
NekoHTML
NekoHTML uses XNI to allow an application to process an HTML document as if it were an XML docu-
ment. There are both SAX and DOM parsers in the org.cyberneko.html.parsers package. You use
org.cyberneko.html.parsers.SAXParser just like the regular Xerces SAXParser; you can plug in your own
ContentHandlers and so on using the regular SAX API. The org.cyberneko.html.parsers.DOMParser
works like the Xerces DOMParser with one notable twist. Instead of using the Xerces XML DOM, it uses
the Xerces HTML DOM, which means you get a DOM implementation that is aware of some of the rules
of HTML. To use NekoHTML, you need to have nekohtml.jar in your classpath, in addition to the regu-
lar jars you need for Xerces. But if you need to process HTML, it’s worth it.
ManekiNeko
Another interesting and useful component of NekoXNI is a validator for Relax-NG called ManekiNeko.
This validator is based on James Clark’s Jing validator for Relax-NG, and it works by creating a wrapper
44
Chapter 1
01 543555 Ch01.qxd 11/5/03 9:40 AM Page 44