8.7

Table Of Contents
Properties
General tab
l EMF to XY group: Select this option if the file received by this task is a Windows print
file. This will prompt the task to perform the first phase of the process, and thus convert the
file to an XML file. If this option is not selected, the input file will not be converted to an
XML file (note that the task will fail if the file it receives is not an XML file). The settings
included in this group fine tune the process. They let you control precisely which text
blocks are recognized as belonging together in one line. This has particular affect when
dealing with font size differences between consecutive passages of text, the distance
from one text passage to another (word distance) as well as the base line offset (vertical
distance). To find out if one text passage belongs to the one found before it, first the
vertical distance, second the horizontal distance and finally, the font size difference are
checked. Only if all three values lie within the tolerance are the two blocks recognized as
belonging together. Additionally, you can control text passages whose horizontal distance
has been recognized as out of the tolerance, but whose type size difference and vertical
distance lie within the tolerance, outputting it in one line. At the output, these text
passages are separated by a tabulator (ASCII code 9).
l
Font size difference: Indicates the smallest acceptable factor between maximum
and minimum font size within one line. A value of 0.60 means that with a ratio from
maximum to minimum font size (in points), that is less than 0.60, two text passages
are not recognized as belonging together. For example, if two text passages are
formatted with different font sizes. Passage 1 with 10, passage 2 with 18 point. The
ratio 0.56 is smaller than the adjusted value 0.60. Therefore those two text
passages are recognized as not belonging together.
l
Word distance: Indicates the largest acceptable distance between two text
passages, so that they are still recognized as belonging together. This the factor the
Page 358