1.7

Table Of Contents
l
Ignore unparseable lines: Ignores any line that does not correspond to the settings
above.
PDF file Input Data settings
PDF Files have a natural, static delimiter in the form of pages, so the options here are
interpretation settings for text in the PDF file.
The Input Data settings for PDF files determine how words, lines and paragraphs are detected
in the PDF when creating data selections.
Each value represents a fraction of the average font size of text in a data selection, meaning
"0.3" represents 30% of the height or width.
l
Word spacing: Determines the spacing between words. As PDF text spacing is
somehow done through positioning instead of actual text spaces, text position is what is
used to find new words. This option determines what percentage of the average width of a
single character needs to be empty to consider a new word has started. The default value
is 0.3, meaning a space is assumed if there is a blank area of 30% of the width of the
average character in the font.
l
Line spacing: Determines the spacing between lines of text. The default value is 1,
meaning the space between lines must be equal to at least the average character height.
l
Paragraph spacing: Determines the spacing between paragraphs. The default value is
1.5, meaning the space between paragraphs must be equal to at least 1.5 times the
average character height to start a new paragraph.
l
Magic number: Determines the tolerance factor for all of the above values. The tolerance
is meant to avoid rounding errors. If two values are more than 70% away from each other,
they are considered distinct; otherwise they are the same. For example, if two characters
have a space of exactly the width of the average character, any space of between 0.7 and
1.43 of this average width is considered one space. A space of 1.44 is considered to be 2
spaces.
l
PDF file color space: Determines if the PDF if displayed in Color or Monochrome in the
Data Viewer. Monochrome display is faster in the Data Viewer. This has no influence on
the actual data extraction or the data mapping performance.
Database Input Data settings
Databases all return the same type of information. Therefore the Input Data options for a
database refer to the database itself instead of to the data.
The following settings apply to any database or ODBC Data Sample.
Page 178