1.8

Table Of Contents
naturally by pages, so the input data settings for PDF files are interpretation settings for text in
the file.
For an overview of all options, see: "Input Data" on page203.
For a CSV File
In a CSV file, data is read line by line, where each line can contain multiple fields, separated by
a delimiter. Even though CSV stands for comma-separated values, fields may be separated
using any character, including commas, tabs, semicolons, and pipes.
The text delimiter is used to wrap around each field just in case the field values contain the
field separator. This ensures that, for example, the field Smith; John is not interpreted as two
fields, even if the field delimiter is the semicolon.
For an explanation of all the options, see: "CSV file Input Data settings" on page203.
For a PDF File
PDF files have a clear and unmovable delimiter: pages. So, the Input Data settings are not
used to set delimiters. Instead, these options determine how words, lines and paragraphs are
detected when you select content in the PDF to extract data from it.
For an explanation of all the options, see: "PDF file Input Data settings" on page204.
For a database
Databases all return the same type of information. Therefore the Input Data options for a
database refer to the tables inside the database. Clicking on any of the tables shows the first
line of the data in that table.
If the database supports stored procedures, including inner joins, grouping and sorting, you can
use custom SQL to make a selection from the database, using whatever language the database
supports.
For an explanation of all the options, see: "Database Input Data settings" on page204.
For a text file
Because text files have many different shapes and sizes, there are a lot of input data settings
for these files. You can add or remove characters in lines if it has a header you want to get rid
of, or strange characters at the beginning of your file, for example; you can set a line width if you
are still working with old line printer data; etc.
It is important that pages are defined properly. This can be done either by using a set number of
lines or using a text (for example, the character P”), to detect on the page. Be aware that this is
not a Boundary setting; it detects each new page, not each new record.
For an explanation of all the options, see: "Text file Input Data settings" on page205.
Page 116