ABBYY FormReader Automatic Form Input System A Guide to Creating MachineMachine-Readable Forms ABBYY Software House Moscow 2001
ABBYY Software House A Guide to Creating Machine-Readable Forms Information in this document is subject to change without notice and does not represent any commitment on the part of ABBYY Software House. The document is supplied as a part of the ABBYY FormReader package under a license agreement. No part of this document may be reproduced or transmitted in any form or by any means, electronic or otherwise, without the express written approval of ABBYY Software House.
CONTENTS WHAT IS A FORM? ....................................................................................................................................5 WHAT IS A MACHINE-READABLE FORM? ..........................................................................................5 FORM COMPLETION M ...............................................................................................................................5 ETHODS...........................................................................
MS Word 2000 graphic tools used to develop machine-readable forms............................................23 Positioning form elements. .................................................................................................................23 Protecting the form.............................................................................................................................24 CERTIFICATION.................................................................................................
What is a form? Questionnaires, social security forms, polling slips, warranty cards – all different types of form used to collect different types of information. How do forms differ from other types of documents? 1. A form has a set number of fields. 2. Field content is always determined by for example field name. E.g. a “Last Name” field contains only last names (if completed correctly), a “Date” field only dates, etc. 3.
Elements of machine-readable forms The following elements may be present on a form: 1) Fields for completion and automatic processing. These contain the information to be gathered. Example Field type Comments A text field for entering letters, digits and other characters Checkboxes to be marked Radio group See “Form Completion Methods” (page 5). These may take the form of squares, bubbles etc., or fields that must be underlined.
1. 2. 3. Dropout form All the fields on the form are white rectangles on a color background. The important thing here is the color used, as it disappears during the scanning process (see recommendations on color choice in Appendix III), leaving only the field contents and reference points on the form image for the recognition module to recognize. Dropout forms are the preferred choice in terms of recognition quality. Raster Forms Field borders on raster forms are termed raster lines – i.e.
can also use forms with such a background color with low saturation. In this case you should find the proper color and it’s saturation manually, depending on the scanner model used. Choosing the form color Red-orange colors are preferable to green as a form color. This is because it represents the greatest possible contrast to blue, and consequently results in enhanced scanning and recognition quality if the forms are completed using blue ink. Appendix III lists the recommended colors for form processing i.
2. text image quality can result in the appearance of field borders or the background on the form image, and consequently, cause a deterioration in the recognition quality. If the printer makes unauthorized changes to the technical print parameters (i.e. different paper, other color components) then the background may become too dark and could prove difficult to remove regardless of the scanning parameters chosen.
(a) (b) The advantages and disadvantages of raster field borders are the same as for raster background. Black&white linear forms Field borders, in the case of linear black&white forms, remain on the scanned image. This means that during recognition the application has to first separate the field borders from the field contents, then recognize the content.
(a) (b) 3) Letters in separate frames This marking type is relatively effective in “disciplining” those completing the form, and the likelihood of glued letters is low. But, as in the two previous cases, any character overlapping the borders (see fig.(b) below), is likely to result in the disappearance of some character parts when the application separates the field borders from its contents, thereby lowering recognition quality.
the highest possible recognition quality is achieved; form processing is faster as there is usually no need to “clean” the form after scanning; the requirements for the location of explanatory information are less strict (information may be placed next to fields or even in field boxes themselves.) This naturally makes additional form space available.
Advantage Design Complexity Disadvantage Easy to design using any graphics editor Advantage Disadvantage Graphics editors feature a good range of tools Cannot be created using all word processors Printing Difficult to print large quantities of good quality forms in-house Easy to print inhouse Printing Cost Professional printing involves higher costs If professional printing services are used, printing costs are lower Image Size Image file sizes are smaller Image file sizes are larger Only spe
General requirements for machine-readable forms This section summarizes the requirements for machine-readable forms. Form background requirements To ensure the successful separation of field contents and field borders: 1. Choose the form type best suited to your needs according to the recommendations listed above. . If possible, use dropout forms or raster border forms (see “Table: Form Type Advantages and Disadvantages – Summary”, page 12).
Requirements for geometric field parameters Raster dot size If the field borders are raster dots, the thickness of the raster line (i.e. the raster size) must be 0.4 pt. The optimal distance between the raster dots is five times their size.
5. 6. 7. If you print your forms using a printer, do not print them with a resolution less than 600 dpi. Always use the same printer to print each form. If this is not possible, try to ensure that the same printer models are used. Never use a Xerox machine to make copies of your form! Xerox copies always distort the image to some extent i.e. frames can become thicker, raster dot size may increase, color saturation may change.
• Open the Solutions folder located in the folder containing MS Visio in (the default location is C:\Program Files\Visio\). Create a new folder in the Solutions folder and give it a name e.g. ABBYY Forms. Copy the Elements.vss file into this folder. The stencil file will be automatically incorporated into MS Visio i.e. it will be included in the list of available stencils (MS Visio (Stencil)). To open the file, select the File>Stencil menu item and then click on the folder created.
1) The default character space size for raster fields (Field (Black Raster) and Field (Orange Raster)) in the Elements stencil is 4х5 mm and 5х6.5 mm for Field (White Rectangle). 2) Character space size for all the above field types may be altered. Field proportions are automatically retained, thus ensuring that corner raster dots do not get glued together. Even though the character space size may be easily changed we do not recommend making them smaller than 4х5 mm. 4.
2. 3. 4. Drag&Drop the Background (Grey) element from the Elements window; determine its size and align it with the page using the appropriate MS Visio tools. Insert black squares into each corner of the form (drag&drop them from the Elements window). The minimum distance between a page border and a black square should be no less than 8 mm (12 mm is recommended).
• The distance between the lines that are to retain on the form image and form fields should be no less than 2 mm. • If you use multipage forms, we recommend that a key field be used for further page assembling. This field should be present on every page of the form and be unique to each copy of the form concerned (it can be either pre-printed or filled manually). Typical examples include “Form Number” field (as on “Visitor’s Questionnaire” in the example below), “ID Number” field etc.
Preparing an MS Visio form for professional printing As already mentioned, color separation must be carried out if you plan on printing a color forms at a printing house. Color separation, however, cannot be carried out within MS Visio itself, therefore, if you used this editor we recommend that the form be converted to CorelDraw format and that color separation be carried out within CorelDraw: 1. In MS Vision open the *.vsd file containing your form (File>Open). 2.
Developing forms using Microsoft Word 2000 (For the purposes of this guide we assume you already have a working knowledge of MS Word. If this is not the case, please consult the extensive literature, which is available with the application.) In the absence of a graphics editor, you can also use MS Word to create your forms. MS Word is a word processing application and as such is not really suitable for complex form design.
Which is best - background or raster? The best recognition results in the case of black&white forms can be obtained using raster forms (see «Black&white forms with raster borders», page 9). Unfortunately, MS Word features only the standard line styles, and has no tools, for example, that allow the alteration of the distance between each dots etc.. Moreover, no line style is offered that provides the proper raster dot size and distance, i.e. for the line to disappear during image despeckling.
1. 2. Right-click the object. Click on Format Object in the local menu. The Format Object dialog will open. Select the appropriate parameters in the dialog. Protecting the form. Once form design is completed and the form is approved, the form can then be protected from accidental modifications. To protect a form (or “document” in MS Word terminology): 1. Select the Protect Document item in the Tools menu. The Protect Document dialog will open. 2. Select the Forms item in the dialog and click OK. Note.
Appendices Useful tips. 1. Raster forms. The raster field marking type is the most useful field marking type when it comes to black&white forms. Not only is it easy to create, it also provides the best recognition quality for black&white forms. Moreover, the image size is the smallest in the case of raster forms compared to all other black&white forms. 2. Red raster field marking type. Red raster represent a good alternative to black raster if the scanning mode to be used is unknown.
Identification of different forms processed in the same batch There are certain things that must be considered during form creation: • whether the form is to be a multipage form • whether the form will be processed in the same batch with forms of a different type In both cases, additional identification reference blocks are required. These elements allow the system to identify the form type and select the proper template as well as to match it correctly (i.e. to locate fields location).
Creating a barcode using CorelDraw This appendix explains how a barcode (EAN-13 format) can be created using CorelDraw. Once created, the barcode can be saved to file and then inserted into a form, or pasted (using the “copy&paste” function) as an OLE-object into any form of your choice, including one developed using a word processing application such as MS Word or MS Visio. Barcode creation is only possible if supported by the current CorelDraw installation (the Edit > Insert Barcode item must be enabled).
9) The created barcode is displayed in the CorelDraw window.
Recommended colors for dropout forms RGB R G Pantone B Paint Saturation Orange Form 252 127 64 Pantone 164 CV 100% 254 191 160 Pantone 164 CV 50% 255 230 217 Pantone 164 CV 20% 255 243 236 Pantone 164 CV 10% Red Form 250 64 37 PANTONE Warm Red CV 100% 253 160 146 PANTONE Warm Red CV 50% 254 217 212 PANTONE Warm Red CV 20% 255 240 238 PANTONE Warm Red CV 8% Green Form 150 218 176 PANTONE 345 CV 100% 203 237 216 PANTONE 345 CV 50% 234 248 240 PANTONE 3