2022.1

Table Of Contents
Text formatting features such as kerning, bold, exponential, etc, may cause these fragments to
be considered as separate even if, to the naked eye, they obviously belong together.
The PDF Text Extraction Tolerance Factors is used to modify the behavior of data selections
made from PDF data files from within PReS Workflow. Each factor available in this window will
determine if two fragments of text in the PDF should be part of the same data selection or not.
Warning
The default values are generally correct for the greatest majority of PDF data files. Only
change these values if you understand what they are for.
Delta Width
Defines the tolerance for the distance between two text fragments, either positive (space
between fragments) or negative (kerning text where letters overlap). When this value is at 0, the
two fragments will need to be exactly one beside the other with no space or overlap between
them.
When this value is at 1, a very large space or overlap will be accepted. This may case "false
positives" and separate words and text blocks may be considered as a single word if the value
is too high.
Accepted values range from 0 to 1. The default value is 0.3, recommended values are between
0.05 and 0.30.
Delta Height
Defines the tolerance for the height and position difference between two target fragments. The
higher the number, the more difference between the fragment's height (the tallest font
character's height) will be accepted and the more vertical distance between fragments are
accepted. Exponents, for example, are higher and lower.
When this value is 0, no vertical shift is accepted between two fragments. When the value is 1,
the second text fragment can be shifted by as much as the height of the first fragment.
Accepted values range from 0 to 1. The default value is 0.15, recommended values are
between 0.00 and 0.50.
Page 795