User guide
QCing Items
www.iprotech.com Ipro eCapture User Guide 6-101
877-324-4776 Q1 2014
Value1 Value2
The column data is separated by a space rather than a tab (which can be, for
example, the equivalent of 5 spaces). Therefore, if the option is not selected,
then the extracted Excel data would look similar to this:
Column A Column B
Value1 Value2
In the above example, the column data is separated by a tab (5 spaces).
Expand Pivot Tables when extracting Excel text: Default is unchecked. If
pivot tables exist, they will be expanded when this option is checked. A flag is
also set in QC to indicate the Pivot table exists in the worksheet.
OCR
OCR images as necessary - Images will be OCRed for indexing/language
identification if necessary. The OCR text obtained from the image is then
passed on to dtSearch for indexing. The OCR will be indexed and available to
be searched on in the Flex Processor.
OCR PDF documents - PDFs with no embedded text: perform OCR prior to
indexing or language identification. PDFs with embedded text (text-behind)
will have text extracted anyway. The OCR text is added to any extracted text
from the PDF. The text obtained through OCR, along with the extracted text
from the PDF, is passed to dtSearch for indexing. The OCR will be indexed and
available to be searched in the Flex Processor. Note: Selecting this option will
impact the time for the Discovery process. OCR Text obtained through OCR
could contain duplicate words as appended to extracted text file. Search hits
could be inflated by these results. Optionally, select PDF page character
threshold to perform OCR on image-based PDFs that may contain a small
amount of embedded text, such as an image key. The default value is 25. If
there is less than this amount of characters retrieved, the PDF will be OCRed.
OCR PowerPoint Documents: Select this option to perform OCR on Micro-
soft PowerPoint files during Data Extract to get text from embedded content in
the slides. This will result in slower speeds for PowerPoint files, but more accu-
rate text extraction.
Minimum OCR Confidence Level [1-100]: The level range settings are
from 1 up to 100. The default is 50. The OCR Confidence Level is the average
of confidence per document, for all pages within a document on which OCR
was performed. Success or failure of a document for flagging is based on the










