User guide

ManualsBrandsVIEW-i ManualsAutomotivePRO

131

132

133

134

135

136

137

138

139

140

Creating a Discovery Job

www.iprotech.com Ipro eCapture User Guide 5-29

877-324-4776 Q1 2014

• Ignore Hyphens - Ignores hyphens entered in the search

criteria. For example, a search for “first-class” will match

incidences of “firstclass” in the files being searched.

• Index all three ways - Searches for all three possible

treatments of hyphens to ensure that matches are found

regardless of which of these three ways the search criteria is

entered.

• Parent/Child Text Handling -These options are used to specify

how text of parent and child documents should be handled during

indexing and are specific to emails (Lotus Notes and Outlook) and

any edocs (non-emails) that contain embedded documents.

• Index child text with parent text - merges and indexes

the text of a child document with that of its parent.

• Separate child and parent text - indexes the text of a

child document separately from its parent. The following

string is added as an include filter: *.MSG *.MSG>*.body

*.EML *.EML>*.body. This occurs while indexing. Two docu-

ments will be produced in the index for .EML and .MSG files.

One is for the body and the other is for the email (head-

ers...). Any attachments are not included in that index.

OCR

• OCR images as necessary - Images will be OCRed for indexing/

language identification if necessary. The OCR text obtained from

the image is then passed on to dtSearch for indexing. The OCR will

be indexed and available to be searched on in the Flex Processor.

• OCR PDF documents - PDFs with no embedded text: perform

OCR prior to indexing or language identification. PDFs with embed-

ded text (text-behind) will have text extracted anyway. Optionally,

select PDF page character threshold to perform OCR on image-

based PDFs that may contain a small amount of embedded text,

such as an image key. The default value is 25. The maximum value

is 10000. If there is less than this amount of characters retrieved,

the PDF will be OCRed.

• OCR PowerPoint Documents: Turn this option on to perform

OCR on Microsoft PowerPoint files during indexing to get text from

embedded content in the slides. This will result in slower indexing

speeds for PowerPoint files, but more accurate search results.