User guide

ManualsBrandsVIEW-i ManualsAutomotivePRO

331

332

333

334

335

336

337

338

339

340

QCing Items

www.iprotech.com Ipro eCapture User Guide 6-101

877-324-4776 Q1 2014

Value1 Value2

The column data is separated by a space rather than a tab (which can be, for

example, the equivalent of 5 spaces). Therefore, if the option is not selected,

then the extracted Excel data would look similar to this:

Column A Column B

Value1 Value2

In the above example, the column data is separated by a tab (5 spaces).

Expand Pivot Tables when extracting Excel text: Default is unchecked. If

pivot tables exist, they will be expanded when this option is checked. A flag is

also set in QC to indicate the Pivot table exists in the worksheet.

OCR

OCR images as necessary - Images will be OCRed for indexing/language

identification if necessary. The OCR text obtained from the image is then

passed on to dtSearch for indexing. The OCR will be indexed and available to

be searched on in the Flex Processor.

OCR PDF documents - PDFs with no embedded text: perform OCR prior to

indexing or language identification. PDFs with embedded text (text-behind)

will have text extracted anyway. The OCR text is added to any extracted text

from the PDF. The text obtained through OCR, along with the extracted text

from the PDF, is passed to dtSearch for indexing. The OCR will be indexed and

available to be searched in the Flex Processor. Note: Selecting this option will

impact the time for the Discovery process. OCR Text obtained through OCR

could contain duplicate words as appended to extracted text file. Search hits

could be inflated by these results. Optionally, select PDF page character

threshold to perform OCR on image-based PDFs that may contain a small

amount of embedded text, such as an image key. The default value is 25. If

there is less than this amount of characters retrieved, the PDF will be OCRed.

OCR PowerPoint Documents: Select this option to perform OCR on Micro-

soft PowerPoint files during Data Extract to get text from embedded content in

the slides. This will result in slower speeds for PowerPoint files, but more accu-

rate text extraction.

Minimum OCR Confidence Level [1-100]: The level range settings are

from 1 up to 100. The default is 50. The OCR Confidence Level is the average

of confidence per document, for all pages within a document on which OCR

was performed. Success or failure of a document for flagging is based on the