User Guide

Automated Forms Processing
When completing a form one has to enter information into blank
spaces or specially designed fields that make up the structure of
the form. This information must then be extracted and processed.
Forms from which data can be extracted, or "captured", automati
cally by computer are called machinereadable. Almost any form
can be structured in such a way as to become machinereadable.
Forms can be filled in:
by hand (such forms are called handprinted, because informa
tion is entered in separate block letters, each letter occupying
one character space);
using a typewriter or printer
in a printing house;
using a combination of all of the above.
Form structure
Sometimes people filling in a form are too careless or sloppy.
For this reason forms are designed in such a way as to make their
completion intuitive and selfevident. The following
design ele
ments
are used to tell people where to write what:.
Entry (or data) fields. These include
Text fields.
Each text field consists of a certain number of
character spaces supplied with an explanatory caption.
Character spaces stand apart so that the entered letters do
not merge.
Check boxes.
These are fields of various shapes (usually
squares, but in practice this can be any geometrical figure
with a closed boundary). A person filling in the form makes
a mark such as a check, a tick or a cross in this field to
select a particular option. Or they may simply ink over the
entire box.
Groups of check boxes.
These are used for multiple choices.
Usually check boxes within one group correspond to mutu
ally exclusive options, i.e. only one of them must be selected.
Service fields.
Service fields contain socalled anchor or refer
ence points that facilitate forms processing. Anchor points are
used by a data capture program to detect the top and bottom
of a form and to correct distortions introduced by scanning.
Anchor points may also be used to identify different forms if
mixed types of forms are processed within one batch. The fol
lowing elements may be used as reference points on forms
processed by ABBYY FormReader:
black squares, corners and crosses;
vertical or horizontal lines;
static text, i.e. field captions that remain unchanged from
form to form.
ID fields or identifiers.
These fields serve to identify the form.
Black squares, corners and crosses can also be used to identify
forms, but identification is more reliable if forms are identified
using such identifiers as numbers, bar codes or form titles.
Image areas.
These areas contain objects which are not to be
recognized, e.g. seals or signatures which will be treated as pic
tures. FormReader can save such images into an ODBC data
base in the following formats: TIF, BMP, JPG, PCX, and WMF.
Optional design elements
: logos, headers, footers and other
formatting elements. In data capture, data contained in these
elements can also be used to identify forms, e.g. by analysing
text in logos the program can find out which company has
issued the invoice.
service fields
text fields check boxes
identifier
Examples of form elements.