User Guide

Form Types
Form types and design elements
Forms can be divided into two major classes  structured forms,
on which the locations and sizes of all fields are exactly the same
for all forms in a batch, and flexible forms, on which the sizes and
locations of fields may vary from form to form. In order to capture
data from a structured form, a program has to know where . to look
for data. For this purpose a template is created which is essentially a
skeleton of a form that contains information about the locations of
fields and the kind of data the program may expect to find in each of
them. The program will then match this template with a completed
form and separate the entered data from the field borders and cap
tions. Next, the entered data are "read" or recognized, i.e. converted
into text and digits.
All the forms in a batch must conform to one and the same pat
tern. It is also essential that reference points and ID fields are pre
served during scanning.
If a form is not structured, it cannot be processed automatically
and requires a human operator to read the data from its fields and
type them into a database. This is a slow and tedious process that can
be avoided by designing a wellstructured form that can then be read
by computer.
Depending on their design, machinereadable forms can be
divided into the following three
major types:
Colour forms.
All data fields on such forms consist of white
rectangles printed on a colour background. Backgrounds are
usually light grey, pink, orange, or green. The colours and satu
ration are selected so that the background disappears during
scanning (this is why they are also known as dropout colours).
Ideally, all elements must disappear during scanning with the
exception of reference points and ID fields. Special scanners
with red or green lamps are used to scan such forms.
Alternatively, the drivers of common scanners may be adjusted
so that they become blind to the background. Colour forms pro
vide the best recognition quality.
Raster forms.
Data fields on such forms consist of white rec
tangles printed on a colour background, but unlike on colour
forms, backgrounds are made up of small dots located at regu
lar intervals from one another. These dots do not disappear
during scanning, but ABBYY recognition software can remove
such dots without losing information entered into the data
fields. There is also a subtype of raster form which has no back
ground at all. The borders of data fields on such forms are made
up of separate dots which can then be filtered out by ABBYY
software.
Blackandwhite linear forms.
Field borders on such forms
consist of solid black lines which do not disappear during scan
ning.
The following field designs are available for linear forms:
(a) solid lines
(b) frames for words
(c) isolated frames for characters
(d) conjoined frames for characters
(e) lines with "combs”
(f) frames with "combs"
The recognition engine separates the data from the field bor
ders and then recognizes them. ABBYY FormReader uses informa
tion about the field design provided on the template and looks for
specific design elements such as vertical lines or the number of
character cells. The program then ignores the formatting and rec
ognizes only the data contained within the fields. A form may also
contain "garbage" or undesirable artefacts resembling field lines.
The program will remember the shape of the fields and distinguish
between the meaningful field borders and the arbitrary "noise"
which will be
removed so that it
does not interfere
with recognition.
A blackandwhite form on which characters are to be entered into separate frames.
Colour dropout form.
Raster field borders.