User guide

Chapter 6, Performing QC
6-78 Ipro eCapture User Guide www.iprotech.com
Q1 2014 877-324-4776
Select OCR Pages missing text to OCR pages within documents that are
missing text. Optionally, select PDF page character threshold to perform
OCR on image-based PDFs that may contain a small amount of embedded
text, such as an image key. The default value is 25. The maximum value is
10000. If there is less than this amount of characters retrieved, the PDF will be
OCRed.
Minimum average OCR confidence level [1-100]: The level range settings
are from 1 up to 100. The default is 50. The OCR Confidence Level is the aver-
age of confidence per document, for all pages within a document on which
OCR was performed. Success or failure of a document for flagging is based on
the average confidence level of the document. If the average confidence level
is below the selected threshold, the document will be flagged in QC with the
OCR Low Confidence Flag.
Select the option Remove Blank Pages and then set the Blank Page
Threshold (1 to 2000) to a value that eliminates the speckles without elimi-
nating any punctuation marks from the pages. Ipro eCapture will remove any
images that have fewer "dots" than this threshold. If this setting is too high,
you may lose images with a few short words. We suggest a setting of 50 as a
starting point.
Process HTML Files with Internet Explorer - Select this option to process
HTML files with Internet Explorer instead of Oracle® Outside-In Technology
(formerly Stellent).
Process CSV files with Microsoft Excel - Select this option to process .CSV
files with Microsoft Excel instead of Oracle® Outside-In Technology (formerly
Stellent).
Image to PDF - Select this option to reprocess selected document as a .PDF.
The .PDF is stored in the Output directory.
Set a Max Page Threshold (1 to 10000) if you want to limit the number of
pages produced by larger files. By default, this option is not checked. If the
Page Threshold is reached, the items are not flagged as exceptions, but
flagged as Page Threshold Exceeded. All pages processed up until the thresh-
old is reached are included in the document. The first page will be the Page
Threshold Exceeded placeholder, and subsequent pages will be those that were
processed within the Max Page Threshold setting.
Placeholder pages over threshold - Select this option to apply a
placeholder to pages exceeding the threshold value indicated.