1.7

Table Of Contents
Extracting data of variable length
In PDF and Text files, transactional data isn't structured uniformly, as in a CSV, database or
XML file. Data can be located anywhere on a page. Therefore, data are extracted from a
certain region on the page. The data can be spread over multiple lines and multiple pages,
however:
l Line items may continue on the next page, separated from the line items on the first page
by a line break, a number of empty lines and a letterhead.
l Data may vary in length: a product description for example may or may not fit on one line.
How to exclude lines from an extraction is explained in another topic: "Extracting transactional
data" on page101 (see From a PDF or Text file).
This topic explains a few ways to extract data with variable lengths.
Finding a condition
The key to extracting data of variable length is to find one or more differences between lines
that make clear how big the region is from where data needs to be extracted.
Whilst, for example, a product description may expand over two lines, other data - such as the
unit price - will never be longer than one line. Either the line above or below the unit price will
be empty when the product description covers two lines.
Such a difference can then be used as a condition in a Condition step or a Case in a Multiple
Conditions step.
A Condition step, as well as each Case in a Multiple Conditions step, can only check for one
condition. To combine conditions, you would need a script.
Using a Condition step or Multiple Conditions step
Using a Condition step ("Condition step" on page120) or a Multiple Conditions step ("Multiple
Conditions step" on page122) one could determine how big the region is that contains the data
that needs to be extracted.
In each of the branches under the Condition or Multiple Conditions step, an Extract step could
be added to extract the data from a particular region. The Extract steps could write their data to
the same field.
Fields cannot be used twice in one extraction workflow.
Different Extract steps can only write extracted data to the same field in the Data Model, if:
Page 112