User`s guide
9 Comparing XML Files
9-10
How the Matching Algorithm Works
In this section...
“Why Do I See Unexpected Results?” on page 9-10
“How the Chawathe Algorithm Works” on page 9-10
“Why Use a Heuristic Algorithm?” on page 9-12
“Examples of Unexpected Results” on page 9-12
Why Do I See Unexpected Results?
The core of the XML file comparison engine is Chawathe’s matching algorithm. This
matching algorithm is a heuristic method based on a scoring system. This means that
comparison results could be unexpected when many elements in each document are very
similar.
See the following sections for some examples.
How the Chawathe Algorithm Works
XML text documents are hierarchical data structures. Users can insert, delete, or reorder
elements, modify their contents, or move elements across different parts of the hierarchy.
The Chawathe algorithm can detect these different types of changes within the hierarchy
of the document. As with conventional text differencing utilities, the Chawathe algorithm
detects local text that is added, deleted, or changed, and additionally can prepare an
edit script that can be used to create a report of the hierarchical location of detected
differences.
The Chawathe algorithm attempts to match elements that are of the same category.
The Chawathe paper refers to these categories as labels. In the following XML example
documents (with labels A, B, and C):
• The three C elements on the left are compared with the three C elements on the right
• The single B element on the left is compared with the two B elements on the right