User Guide

Automated Forms Processing
Sometimes it is not necessary to capture all the data available on a
document. This is particularly true when digitising archives. In this
case only certain fields are selected for recognition. The program
creates a unique index on the basis of these fields and converts
the document images into a suitable storage format. Next time
when someone looks for a particular page, they will be able to
find its image by carrying out a search in the index.
Documents stored in archives are usually not machinereadable:
The location of fields is not fixed,
There are no reference points (e.g. black squares or crosses),
Fields may include words in cursive writing,
Fields may be obstructed by stamps or inscriptions.
This means that such documents cannot be processed using
the conventional template approach.
Solution
1. The traditional approach must be used wherever the fields have
fixed locations  templates must be created for such docu
ments. Captions or tables that are present on all the documents
can be used as reference points. Usually archive documents
contain about a dozen such fields, which is sufficient for reli
able data capture. The title of the document can be used as an
identifier. This method can be used to create at least part of the
index. For example, in the case of waybills or invoices the pro
gram can recognize bar codes, the country of origin, the tele
phone of the sender and the postal code of the addressee.
2. Fields without a fixed location can be recognized using the
FlexiCapture technology. This technology can find any field if
information about its surrounding elements is available. The
only drawback is that creating formalized descriptions is fairly
expensive and can only be done by a specialist.
3. Sometimes good results can be achieved using the Key From
Image approach, or manual input fields. This approach has
already been described on p. 27. The program helps the opera
tor find the required field on the form and displays its image.
Then the operator types in the data manually. This method can
be used to enter information from any document that cannot
be read by computer and is more convenient than ordinary
manual typing.
Capturing data from forms that are not machinereadable
Automated input of separate
fields on not machineread
able form.