User guide
Setting Data Extraction Options
www.iprotech.com Ipro eCapture User Guide 5-85
877-324-4776 Q1 2014
Setting the OCR Options
New in 2014.0.0, data sets are OCRed only once during indexing or data
extraction and the OCR output is stored in a common folder location at the
Project level. This ensures that results during search and review remain the
same. By not repeating OCR work on the same data sets, speed is improved
and time is saved.
All OCR options are deselected by default for new Projects. By setting OCR
options at the Project level, it is not necessary to set them individually for each
Job type because they are now located under the new Common Options tab.
The OCR options apply to all Job types with the exception of a specific OCR
option, OCR Pages Missing Text, which applies to Processing Jobs only.
For Data Extract, an item will use the existing OCR output as its own output
when the following conditions are met:
•OCR is enabled
• The PDF page character threshold and Minimum OCR Confidence Level
are the same as when the OCR was first performed.
If the PDF page character threshold or the Minimum OCR Confidence options
are higher than when the OCR was first performed, the document will be re-
OCRed to allow for more characters in the embedded text of the PDF or to pro-
duce a higher quality of OCR, respectively.
•Select OCR Pages missing text (Processing Jobs only) to OCR
pages within documents that are missing text. It will cause any
page that does not have OCR text to have an OCR task generated
for it. Optionally, select PDF page character threshold to per-
form OCR on image-based PDFs that may contain a small amount
of embedded text, such as an image key. The default value is 25.
The maximum value is 10000. If there is less than this amount of
characters retrieved, the PDF will be OCRed.
• OCR images as necessary - Images will be OCRed for indexing/
language identification if necessary. The OCR text obtained from
the image is then passed on to dtSearch for indexing. The OCR will
be indexed and available to be searched on in the Flex Processor.
• OCR PDF documents - PDFs with no embedded text: perform
OCR prior to indexing or language identification. PDFs with embed-
ded text (text-behind) will have text extracted anyway. The OCR










