User`s guide

9 Comparing XML Files

9-10

How the Matching Algorithm Works

In this section...

“Why Do I See Unexpected Results?” on page 9-10

“How the Chawathe Algorithm Works” on page 9-10

“Why Use a Heuristic Algorithm?” on page 9-12

“Examples of Unexpected Results” on page 9-12

Why Do I See Unexpected Results?

The core of the XML file comparison engine is Chawathe’s matching algorithm. This

matching algorithm is a heuristic method based on a scoring system. This means that

comparison results could be unexpected when many elements in each document are very

similar.

See the following sections for some examples.

How the Chawathe Algorithm Works

XML text documents are hierarchical data structures. Users can insert, delete, or reorder

elements, modify their contents, or move elements across different parts of the hierarchy.

The Chawathe algorithm can detect these different types of changes within the hierarchy

of the document. As with conventional text differencing utilities, the Chawathe algorithm

detects local text that is added, deleted, or changed, and additionally can prepare an

edit script that can be used to create a report of the hierarchical location of detected

differences.

The Chawathe algorithm attempts to match elements that are of the same category.

The Chawathe paper refers to these categories as labels. In the following XML example

documents (with labels A, B, and C):

• The three C elements on the left are compared with the three C elements on the right

• The single B element on the left is compared with the two B elements on the right