User Guide
Form Types
Form types and design elements
Forms can be divided into two major classes structured forms,
on which the locations and sizes of all fields are exactly the same
for all forms in a batch, and flexible forms, on which the sizes and
locations of fields may vary from form to form. In order to capture
data from a structured form, a program has to know where . to look
for data. For this purpose a template is created which is essentially a
skeleton of a form that contains information about the locations of
fields and the kind of data the program may expect to find in each of
them. The program will then match this template with a completed
form and separate the entered data from the field borders and cap
tions. Next, the entered data are "read" or recognized, i.e. converted
into text and digits.
All the forms in a batch must conform to one and the same pat
tern. It is also essential that reference points and ID fields are pre
served during scanning.
If a form is not structured, it cannot be processed automatically
and requires a human operator to read the data from its fields and
type them into a database. This is a slow and tedious process that can
be avoided by designing a wellstructured form that can then be read
by computer.
Depending on their design, machinereadable forms can be
divided into the following three
major types:
Colour forms.
All data fields on such forms consist of white
rectangles printed on a colour background. Backgrounds are
usually light grey, pink, orange, or green. The colours and satu
ration are selected so that the background disappears during
scanning (this is why they are also known as dropout colours).
Ideally, all elements must disappear during scanning with the
exception of reference points and ID fields. Special scanners
with red or green lamps are used to scan such forms.
Alternatively, the drivers of common scanners may be adjusted
so that they become blind to the background. Colour forms pro
vide the best recognition quality.
Raster forms.
Data fields on such forms consist of white rec
tangles printed on a colour background, but unlike on colour
forms, backgrounds are made up of small dots located at regu
lar intervals from one another. These dots do not disappear
during scanning, but ABBYY recognition software can remove
such dots without losing information entered into the data
fields. There is also a subtype of raster form which has no back
ground at all. The borders of data fields on such forms are made
up of separate dots which can then be filtered out by ABBYY
software.
Blackandwhite linear forms.
Field borders on such forms
consist of solid black lines which do not disappear during scan
ning.
The following field designs are available for linear forms:
(a) solid lines
(b) frames for words
(c) isolated frames for characters
(d) conjoined frames for characters
(e) lines with "combs”
(f) frames with "combs"
The recognition engine separates the data from the field bor
ders and then recognizes them. ABBYY FormReader uses informa
tion about the field design provided on the template and looks for
specific design elements such as vertical lines or the number of
character cells. The program then ignores the formatting and rec
ognizes only the data contained within the fields. A form may also
contain "garbage" or undesirable artefacts resembling field lines.
The program will remember the shape of the fields and distinguish
between the meaningful field borders and the arbitrary "noise"
which will be
removed so that it
does not interfere
with recognition.
A blackandwhite form on which characters are to be entered into separate frames.
Colour dropout form.
Raster field borders.