User Guide
XML and character sets 425
The following script returns the root node’s type, name, and number of children:
put gParser.type, gParser.name, gParser.count(#child)
-- #element "ROOT OF XML DOCUMENT" 1
The main difference between the root node and its child nodes is that there are several script
methods that apply to the entire XML document and operate on the root node only. These
methods include
doneParsing(), getError(), ignoreWhiteSpace(), makeList(),
parseString(), and parseURL().
Treating white space
The default behavior of the XML Parser Xtra is to ignore character data between XML tags when
all the characters are white space. This type of white space is usually due to Return characters and
superfluous space characters, but sometimes it can have meaning to the XML document.
You can use the
ignoreWhiteSpace() method to change the way the Xtra treats white space. By
setting the
ignoreWhiteSpace() to FALSE instead of its default value of TRUE, you can tell the
Xtra to treat instances of white space as literal data nodes. This way, white space between elements
is treated as actual data.
The following script statements leave
ignoreWhiteSpace() set to the default TRUE value, and
parse the given XML into a list. The
sample element has no children in the list.
XMLtext = "<sample> </sample>"
parserObj.parseString(XMLtext)
theList = parserObj.makelist()
put theList
-- ["ROOT OF XML DOCUMENT": ["!ATTRIBUTES": [:], "sample": ["!ATTRIBUTES":
[:]]]]
The following script statements set ignoreWhiteSpace() to FALSE, and parse the given XML
into a list. The
sample element now has a child that contains one space character.
XMLtext = "<sample> </sample>"
parserObj.ignoreWhiteSpace(FALSE)
parserObj.parseString(XMLtext)
theList = parserObj.makelist()
put theList
-- ["ROOT OF XML DOCUMENT": ["!ATTRIBUTES": [:], "sample": ["!ATTRIBUTES":
[:], "!CHARDATA": " "]]]
If there are non-white space characters in a !CHARDATA node, all the characters of the node,
including leading and trailing white space characters, are retained.
XML and character sets
When you use XML, remember that different computer systems use different binary encoding to
represent text characters.
The XML Parser Xtra adheres strictly to the XML specification, which states that XML
documents are, by default, encoded using the UTF-8 character set. If the document is
not encoded in UTF-8, it must include a declaration of its character set in the first line of
the document.
The following XML declares the IOS-8859-1 character set, also known as Latin1:
<?xml version="1.0" encoding="ISO-8859-1" ?>