User guide
Creating a Discovery Job
www.iprotech.com Ipro eCapture User Guide 5-29
877-324-4776 Q1 2014
• Ignore Hyphens - Ignores hyphens entered in the search
criteria. For example, a search for “first-class” will match
incidences of “firstclass” in the files being searched.
• Index all three ways - Searches for all three possible
treatments of hyphens to ensure that matches are found
regardless of which of these three ways the search criteria is
entered.
• Parent/Child Text Handling -These options are used to specify
how text of parent and child documents should be handled during
indexing and are specific to emails (Lotus Notes and Outlook) and
any edocs (non-emails) that contain embedded documents.
• Index child text with parent text - merges and indexes
the text of a child document with that of its parent.
• Separate child and parent text - indexes the text of a
child document separately from its parent. The following
string is added as an include filter: *.MSG *.MSG>*.body
*.EML *.EML>*.body. This occurs while indexing. Two docu-
ments will be produced in the index for .EML and .MSG files.
One is for the body and the other is for the email (head-
ers...). Any attachments are not included in that index.
OCR
• OCR images as necessary - Images will be OCRed for indexing/
language identification if necessary. The OCR text obtained from
the image is then passed on to dtSearch for indexing. The OCR will
be indexed and available to be searched on in the Flex Processor.
• OCR PDF documents - PDFs with no embedded text: perform
OCR prior to indexing or language identification. PDFs with embed-
ded text (text-behind) will have text extracted anyway. Optionally,
select PDF page character threshold to perform OCR on image-
based PDFs that may contain a small amount of embedded text,
such as an image key. The default value is 25. The maximum value
is 10000. If there is less than this amount of characters retrieved,
the PDF will be OCRed.
• OCR PowerPoint Documents: Turn this option on to perform
OCR on Microsoft PowerPoint files during indexing to get text from
embedded content in the slides. This will result in slower indexing
speeds for PowerPoint files, but more accurate search results.










