User Guide
What is an OCR System?
Optical character recognition (OCR) is the translation of optically scanned bitmaps of printed
text characters into character codes, such as ASCII. An OCR system is an efficient way to help
you turn printed/scanned documents, image or PDF files into files that can be edited, searched
and otherwise manipulated on a computer.
ABBYY FineReader is an easytouse program that recognizes texts in practically any font with
out any prior training. The program features high recognition accuracy and low sensitivity to
print defects due to its incorporation of special recognition technology based on the princi
ples of Integrity, Purposeful and Adaptable (IPA) perception.
ABBYY’s IPA Technology:
ABBYY FineReader’s recognition process is based on the principles of ABBYY’s IPA
perception. Three principles determine the behavior of the system:
● Integrity – the identification of recognition objects based on a set of basic
elements and their interrelations.
● Purposefulness – the generation and purposeful verification of recognition
hypotheses.
● Adaptability – the system’s ability to learn and be trained.
There are two stages in the process of inputting a document for
OCR:
1. Scanning. During the scanning stage, a scanner reads the image and transfers
it into a computer. The acquired image is nothing more than a picture (a set
of black, white and color dots that is not editable with a word processor).
2. Recognition. During the recognition stage, FineReader analyzes the image
file transmitted by the scanner (layout analysis) and recognizes each character.
The layout analysis (selecting the recognition areas, tables, pictures, lines, and
individual characters) and image reading processes are closely related. Page
layout analysis is more accurate when the nature of the text is known to the
application.
The system generates a hypothesis about a recognition object (a character, part of a character,
or several glued characters) and then accepts or rejects the hypothesis according to whether
the structural elements are present. These structural elements are computer equivalents of
character parts crucial for human perception (arcs, circles, dots, etc.). The application then
adapts itself to the text according to the degree of accuracy attained. Purposeful searching and
context information enable the system to recognize even torn and distorted characters mak
ing the system oblivious to print defects. Recognized text, which can be edited or saved in a
convenient format, is displayed in FineReader Text window. The final result is the recognized
24
ABBYY FineReader 7.0 User’s Guide