User guide

ManualsBrandsVIEW-i ManualsAutomotivePRO

191

192

193

194

195

196

197

198

199

200

Setting Data Extraction Options

www.iprotech.com Ipro eCapture User Guide 5-85

877-324-4776 Q1 2014

Setting the OCR Options

New in 2014.0.0, data sets are OCRed only once during indexing or data

extraction and the OCR output is stored in a common folder location at the

Project level. This ensures that results during search and review remain the

same. By not repeating OCR work on the same data sets, speed is improved

and time is saved.

All OCR options are deselected by default for new Projects. By setting OCR

options at the Project level, it is not necessary to set them individually for each

Job type because they are now located under the new Common Options tab.

The OCR options apply to all Job types with the exception of a specific OCR

option, OCR Pages Missing Text, which applies to Processing Jobs only.

For Data Extract, an item will use the existing OCR output as its own output

when the following conditions are met:

•OCR is enabled

• The PDF page character threshold and Minimum OCR Confidence Level

are the same as when the OCR was first performed.

If the PDF page character threshold or the Minimum OCR Confidence options

are higher than when the OCR was first performed, the document will be re-

OCRed to allow for more characters in the embedded text of the PDF or to pro-

duce a higher quality of OCR, respectively.

•Select OCR Pages missing text (Processing Jobs only) to OCR

pages within documents that are missing text. It will cause any

page that does not have OCR text to have an OCR task generated

for it. Optionally, select PDF page character threshold to per-

form OCR on image-based PDFs that may contain a small amount

of embedded text, such as an image key. The default value is 25.

The maximum value is 10000. If there is less than this amount of

characters retrieved, the PDF will be OCRed.

• OCR images as necessary - Images will be OCRed for indexing/

language identification if necessary. The OCR text obtained from

the image is then passed on to dtSearch for indexing. The OCR will

be indexed and available to be searched on in the Flex Processor.

• OCR PDF documents - PDFs with no embedded text: perform

OCR prior to indexing or language identification. PDFs with embed-

ded text (text-behind) will have text extracted anyway. The OCR