LEGAL NOTICES Copyright © 2003 ScanSoft, Inc. All rights reserved. No part of this publication may be transmitted, transcribed, reproduced, stored in any retrieval system or translated into any language or computer language in any form or by any means, mechanical, electronic, magnetic, optical, chemical, manual, or otherwise, without prior written consent from ScanSoft, Inc., 9 Centennial Drive, Peabody, Massachusetts 01960. Printed in the United States of America and in Ireland.
C O N T E N T S WELCOME 7 Using this Guide Getting online Help Online HTML Help Context-Sensitive Help Tech Notes Glossary When to go online 1 INSTALLATION 8 9 9 9 10 10 10 11 AND SETUP System requirements Installing OmniPage Pro Setting up your scanner with OmniPage Pro How to start the program Registering your software New features in OmniPage Pro 14 2 INTRODUCTION 12 13 14 16 17 17 19 What is optical character recognition OmniPage Pro’s OCR capabilities Documents in OmniPage Pro Basic proc
Managing documents Thumbnails Document Manager Customizing Document Manager columns Deleting pages from a document Printing a document Closing a document OmniPage Documents Why save to OPD How to save to OPD How to load an OPD Settings 3 PROCESSING DOCUMENTS Quick Start Guide Loading and recognizing sample image files Scanning and recognizing a single page Processing overview Automatic processing Stopping and restarting automatic processing Manual processing Combined processing Processing with workflows
Manual zoning Zone types and properties Working with zones Speed zoning Table grids in the image Using zone templates 4 PROOFING 63 AND EDITING The editor display and views Proofreading OCR results Verifying text User dictionaries Languages Training Manual training IntelliTrain Training files Text and image editing On-the-fly editing Reading text aloud 5 SAVING 54 55 57 59 59 61 64 65 67 68 69 69 70 70 71 73 75 76 79 AND EXPORTING Saving OmniPage Documents Export Results button Saving original im
WORKFLOWS Workflows Sample workflows Running workflows Workflow Assistant Creating workflows Modifying workflows Batch Manager Creating new jobs Modifying jobs Managing and running jobs Watched folders Barcode driven workflows Voice recognition 7 TECHNICAL INFORMATION 93 94 94 96 98 98 101 101 102 103 103 104 106 107 109 Troubleshooting 110 Solutions to try first 110 Testing OmniPage Pro 111 Increasing memory resources 112 Increasing disk space 112 Text does not get recognized properly 113 Problems
Welcome Welcome to this OmniPage Pro® text recognition program, and thank you for choosing our software! The following documentation has been provided to help you get started and give you an overview of the program. This User’s Guide This guide introduces you to using OmniPage Pro 14. It includes installation and setup instructions, a description of the program’s commands and working areas, task-oriented instructions, ways to customize and control processing, and technical information.
used scanner models. Access ScanSoft’s web site from the OmniPage Pro Installer or afterwards from the Help menu. Using this Guide This guide is written with the assumption that you know how to work in the Microsoft Windows environment. Please refer to your Windows documentation if you have questions about how to use dialog boxes, menu commands, scroll bars, drag and drop functionality, shortcut menus, and so on.
Getting online Help In addition to using this guide, you can use OmniPage Pro’s online Help to learn about features, settings, and procedures. Online Help is available after you install OmniPage Pro. Online HTML Help Open OmniPage Pro’s online Help at its top level by choosing Help Topics at the top of the Help menu. This allows you to see topics arranged in a Table of Contents, search an alphabetical list of keywords or make full-text searches through the topics.
Tech Notes ScanSoft’s web site at www.scansoft.com contains Tech Notes on commonly reported issues using OmniPage Pro 14. Web pages may also offer assistance on the installation process and troubleshooting. Glossary This guide does not include a glossary. The online Help has a comprehensive glossary, with its own alphabetical index and a table of contents. Please consult it if you want to find the meaning of a term used in this guide or in the program.
Chapter 1 Installation and setup This chapter provides information on installing and starting OmniPage Pro 14.
System requirements The minimum requirements to install and run OmniPage Pro 14 are: ◆ A computer with an Intel® Pentium® III processor or equivalent ◆ Microsoft® Windows® 98 (from second edition), Windows Me, Windows NT® 4.0 (from Service Pack 6), Windows 2000 (from Service Pack 2), Windows XP or Windows Server 2003 ◆ Microsoft Internet Explorer 5.
Chapter 1 Installing OmniPage Pro OmniPage Pro 14’s installation program takes you through installation with instructions on every screen. Before installing OmniPage Pro: ◆ Close all other applications, especially anti-virus programs. ◆ Log into your computer with administrator privileges if you are installing on Windows NT, 2000, XP or Server 2003.
Setting up your scanner with OmniPage Pro All files needed for scanner setup and support are copied automatically during the program’s installation, but no scanner setup occurs at installation time. Before using OmniPage Pro 14 for scanning, your scanner should be installed with its own scanner driver software and tested for correct functionality. Scanner driver software is not included with OmniPage Pro. Scanner setup is done through the Scanner Setup Wizard.
Chapter 1 ◆ ◆ ◆ ◆ ◆ ◆ ◆ ◆ By default OmniPage Pro uses its own scanning interface, located in the Scanner panel of the Options dialog box. If you want to use your scanner’s own interface instead, choose Advanced settings and select this. Choose Modify hints only if you are experienced in configuring scanners or have been advised by Technical Support to do so. Click Next to start the tests. For the Basic scan test, insert a test page into your scanner.
How to start the program To start OmniPage Pro 14 do one of the following: ◆ Click Start in the Windows taskbar and choose All Programs! ScanSoft OmniPage Pro 14.0!OmniPage Pro 14.0. ◆ Double-click the OmniPage Pro icon in the program’s installation folder or on the Windows desktop if placed there. ◆ Double-click an OmniPage Document (OPD) icon or file name; the clicked document is loaded into the program. See “OmniPage Documents” on page 29.
Chapter 1 Registering your software ScanSoft’s online registration runs at the end of installation. Please ensure web access is available. We provide an easy electronic form that can be completed in less than five minutes. When the form is filled, click Submit. If you did not register the software during installation, you will be periodically invited to register later. You can go to www.scansoft.com to register online.
Feature Description See Colored backgrounds Get better recognition of text printed on color or shaded backgrounds. There is also improved noise removal. page 49 Resolution control Choose the resolution for saved page images and for images embedded in recognized pages. page 85 Improved proofing system The two parts of words hyphenated at line ends are now joined. The image viewer and the verifier display both image parts.
Chapter 2 Introduction You probably use your computer for business correspondence, preparing reports, handling data and an ever-increasing number of other uses. The challenge is that, in spite of the digital revolution, certain sources of information still circulate in printed, paper form and cannot be used immediately in a computer.
What is optical character recognition Optical character recognition is the process of extracting text from an image. This image can result from scanning a paper document or opening an electronic image file. Images do not have editable text characters; they have many tiny dots (pixels) that together form character shapes. These present a picture of the text on a page. During OCR, OmniPage Pro analyzes the character shapes in an image and defines solutions to produce editable text.
Chapter 2 Documents in OmniPage Pro OmniPage Pro 14 handles documents one at a time. When you acquire your first image (from scanner or from file) a new document is started. Further acquired images are added to the same document, until you save and close it. A document in OmniPage Pro consists of one image for each document page. After you perform OCR, the document will also contain recognized text, displayed in the Text Editor, possibly along with graphics and tables.
The OmniPage Desktop The OmniPage Desktop has a title bar and a menu bar along the top and a status bar along the bottom. It has three main working areas, separated by splitters: the Document Manager, the Image Panel and the Text Editor. Each has close, maximize and restore buttons top right. The Image Panel has an Image toolbar and the Text Editor has a Formatting toolbar. Standard toolbar OmniPage Toolbox Formatting toolbar Thumbnails show a picture of each page in the document.
Chapter 2 We show the program with a three-page document. Page one is the current page, which has been recognized and proofed. Page two has been recognized but not proofed yet. Page three has been acquired and manually zoned, but not recognized yet. The icons at the bottom of the thumbnail images show page status. Status bar buttons let you show or hide the main screen areas and move to other pages in the document.
The Image Panel When this displays the current page image, the Image toolbar is available. All page images have a background value: process or ignore. Zones can be manually drawn on page images, or can be placed automatically after recognition. There are five zone types: Process, Ignore, Text, Table, Graphics. Areas inside process zones and on a process background outside other zones have zones automatically drawn and their zone types determined during processing. See “Zones and backgrounds” on page 53.
Chapter 2 The OmniPage Toolbox This Toolbox lets you drive the processing. By default it is located along the top of the OmniPage Desktop, just above the working areas. It can be floated and also be docked along the bottom of the desktop. Start/Stop button Get Page button Get Pages drop-down list Workflow dropdown list with two sample workflows and a user-defined one.
Managing documents Document management can be done by thumbnails in the Image Panel or by the Document Manager, situated along the bottom of the OmniPage Desktop. Both summarize the pages in the document and are synchronized. Our pictures show these with the same seven-page document. Pages 1 and 2 are selected and page 4 is the current page, that is, the one shown in the Image Panel. Page status is shown as follows: Page Status Icon Page image has been...
Chapter 2 the Ctrl key as you click thumbnails to add pages to a selection one by one. Then you can move or delete the selected pages as a group, or send them to (re)recognition. You can also export selected pages. Get information on an image by hovering the cursor over it with Image Info enabled in the image panel shortcut menu. A popup text displays the image size in pixels and the program’s unit of measurement. Image resolution is also shown.
When multiple pages are being selected, the page set as current does not change. All selected pages are highlighted. Customizing Document Manager columns You can specify which columns of information you want to see in the Document Manager. Click Customize Columns... in the View menu for the following dialog box: This item is highlighted. Click a checkbox to select the item. Image sizes are expressed in pixels. Highlight an item and use these arrows to change the order of columns.
Chapter 2 Printing a document You can print the document with the Print item in the File menu. Choose whether to print images or text (that is, recognition results as they appear in the Text Editor). You can print all pages or a range of pages. The Print tool in the Standard toolbar prints images or text, depending whether the Image Panel or the Text Editor is active. Closing a document Choose Close in the File menu to close a document.
Why save to OPD You do not have to save your documents to the OPD file type. You would typically do this for the following reasons: ◆ You cannot finish working with the document in the current session. ◆ You want to pass the document to other users who have OmniPage Pro. For example, you can pass an OPD file to a specialist for proofing. In an office network, you may have one scanner generating images for recognition and proofing at several workstations.
Chapter 2 When saving, you have two file type choices: OmniPage Document or OmniPage Document (Extended). The latter allows you to embed a user dictionary, training file or zone template file in the OPD. This can increase file size considerably but makes the OPD more portable. To embed any of these items, load them before the save to the OmniPage Document (Extended) file type. How to load an OPD Select Open OPD... from the File menu. The file type OmniPage Document includes both normal and extended OPDs.
panel is not available if you requested display of your scanner’s native TWAIN interface when you set up your scanner. See “Setting up your scanner with OmniPage Pro” on page 14. Direct OCR This feature provides OCR services directly from your favorite word processor or similar application. Use this panel to register and unregister applications for Direct OCR and to enable or disable this service. You can also specify automatic or manual zoning and whether proofreading is desired or not.
Chapter 3 Processing documents This tutorial chapter describes different ways you can process a document and also provides information on key parts of this processing.
Quick Start Guide This topic takes you step-by-step through the basic OCR process. Loading and recognizing sample image files You will find sample image files in the program folder, both single-page and multi-page files. First try reading these files using the procedure presented below, except for the references to a scanner. See “Input from image files” on page 48. The results provide you with a benchmark of the recognition quality you should expect from your own files of comparable quality.
Chapter 3 What you do: What happens: 1. Set up your scanner using the Scanner Wizard, if this is not already done. Configures OmniPage Pro to work with your scanner. 2. Select Start!All Programs!ScanSoft OmniPage Pro 14.0!OmniPage Pro 14.0 Opens OmniPage Pro on your computer. 3. Place the document correctly in your scanner. 4. From the Get Page drop-down list, select a scan option for your document: black-and-white, grayscale or color.
Processing overview The following flow diagram summarizes the processing steps: Get Pages from file page 48 from scanner page 49 Describe page layout page 51 Apply a template page 61 other page 48 Autozoning page 53 Manual zoning page 54 Export pages Perform OCR with current settings page 31 Verify and edit page 67 Proofread page 65 to file page 82 to Clipboard page 89 via Mail page 90 other page 92 Here is an overview of the processing methods you can use.
Chapter 3 recognize just those problem pages. Alternatively, you can acquire images with manual processing, draw zones on some or all of them, and then send all pages to automatic processing. Workflow A workflow consists of a series of steps and their settings. Typically it will include a recognition step, but it does not have to. Workflows are listed in the Workflow drop-down list – sample workflows plus any you create. You can choose to place an OmniPage icon on your taskbar.
Automatic processing Automatic processing provides an efficient way of handling documents, especially larger ones. First you select all settings needed, then you can use the Start button in the OmniPage Toolbox to process a new document from start to finish or to restart and finish processing on an open document. Start button Workflow drop-down list Get Page button Perform OCR button Get Pages drop-down list Some items appear only in OmniPage Pro 14 Office, others only if the source is available.
Chapter 3 them as mail attachments or direct them to other targets. Save the document as an OmniPage Document file from the File menu or Standard toolbar. See “Saving and exporting” on page 79. 5. Choose in the Standard toolbar or Options in the Tools menu and check that settings are appropriate for your document. You can, for instance, specify recognition languages and whether you want to proofread the document or not. See “Settings” on page 31. 6.
Manual processing Manual processing gives you more precise control over the way your pages are handled. You can process the document page-by-page with different settings for each page. The program also stops between each step: acquiring images, performing recognition, exporting. This lets you, for instance, change the page background and draw zones manually on each page. You start each step in the process by clicking the three numbered buttons on the OmniPage Toolbox. 1.
Chapter 3 6. Select a value for the Perform OCR button. You describe the layout of the incoming pages. This value has an influence if auto-zoning runs on any pages. See “Describing the layout of the document” on page 51. You can also select a template to have its zones placed on the current page. See “Using zone templates” on page 61. 7. Click the Perform OCR button to have the current page recognized.
Start automatically and finish manually: When you have a large document with only a few pages needing special attention, you do not have to manually process the whole document. You can process it automatically and view results in the Text Editor. You can determine which pages are in order, and which need different settings or some manual zoning. After adjusting settings and/or modifying zones, use manual processing to re-recognize just those pages. 1.
Chapter 3 backgrounds or zones to exclude areas from processing. Use process backgrounds or zones to specify areas to be auto-zoned. 4. Click the Start button, then choose Finish Processing Existing Pages in the Automatic Processing dialog box. 5. After proofing (if requested) you can save or export the document. Processing with workflows A workflow consists of a series of steps and their settings. It does not have to conform to the 1-2-3 pattern of traditional processing.
Processing from other applications You can use the Direct OCR™ feature to call on the recognition services of OmniPage Pro while you work in your usual word-processor or other application. First you must establish the direct connection with the application. Then, two items in its File Menu open the door to OCR facilities. How to set up Direct OCR 1. Start the application you want connected to OmniPage Pro. Start OmniPage Pro, open the Options dialog box at the Direct OCR panel and select Enable Direct OCR.
Chapter 3 How to use Direct OCR 1. Open your registered application and work in a document. To acquire recognition results from scanned pages, place them correctly in the scanner. 2. Use the target application’s File Menu item Acquire Text Settings... to specify settings to be used during recognition. Any settings not offered take their values from those last used in OmniPage Pro. Settings changed for Direct OCR are also changed in OmniPage Pro. 3.
How to use OmniPage Pro with PaperPort The PaperPort® program is a paper management software product from ScanSoft. It lets you link pages with suitable applications. Pages can contain pictures, text or both. If PaperPort exists on a computer with OmniPage Pro, its OCR services become available and amplify the power of PaperPort. You can choose an OCR program by right clicking on a text application’s PaperPort link, selecting Preferences and then selecting OmniPage Pro 14 as the OCR package.
Chapter 3 Processing with the Batch Manager You can schedule processing jobs to be performed automatically at a specified time in the future. The job pages can come from a scanner with an ADF or from image files. You do not have to be present at your computer at job start time, nor does OmniPage Pro have to be running. It does not matter if your computer is turned off after the job is set up, so long as it is running at job start time.
Defining the source of page images There are two possible image sources: from image files and from a scanner. There are two main types of scanners: flatbed or sheetfed. A scanner may have a built-in or added Automatic Document Feeder (ADF), which makes it easier to scan multi-page documents. The images from scanned documents can be input directly into OmniPage Pro or may be saved with the scanner’s own software to an image file, which OmniPage Pro can later open.
Chapter 3 Normally the Add button places each file at the bottom of the file list. To place a file at a different location, highlight a file in the list. The new file will be added immediately below the lowest highlighted file. In OmniPage Pro Office, files can also be imported from FTP locations, Microsoft SharePoint or ODMA sources. The minimum width or height for an image file is 50 pixels; the maximum is 71cm. (28 inches). See online Help for pixel limits.
Brightness and contrast Good brightness and contrast settings play an important role in OCR accuracy. Set these in the Scanner panel of the Options dialog box or in your scanner’s interface. The diagram illustrates an optimum brightness setting. After loading an image, check its appearance. If characters are thick and touching, lighten the brightness. If characters are thin and broken, darken it. Then rescan the page.
Chapter 3 You can scan double-sided documents with an ADF. A duplex scanner will manage this automatically. For non-duplex scanners, select Scan double-sided pages in the Scanner panel of the Options dialog box. Then you can scan the document in just a few passes, with even pages grouped together and odd pages also grouped. OmniPage Pro will merge the pages for you.
Single column, no table Choose this setting if your pages contain only one column of text and no table. Business letters or pages from a book are normally like this. Choose it also for a page with words or numbers arranged in columns if you do not want these placed in a table or decolumnized or treated as separate columns. Graphics may be detected.
Chapter 3 Zones and backgrounds Zones define areas on the page to be processed or ignored. Zones are rectangular or irregular, with vertical and horizontal sides. Page images in a document have a background value: process or ignore (the latter is more typical). Background values can be changed with the tools shown.
Auto-zone a page background Acquire a page. It appears with a process background. Draw a zone. The background changes to ignore. Draw text, table or graphic zones to enclose areas you want manually zoned. Click the Process background tool (shown) to set a process background. Draw ignore zones over parts of the page you do not need. After recognition the page will return with an ignore background and new zones round all elements found on the background.
Chapter 3 No. Type What happens: 1 Text zone OCR runs and generates text. 2 Table zone OCR runs, text is placed in a table grid. 3 Graphic zone Image is embedded in recognized page. 4 Process zone 5 Process background Auto-zoning creates one or more zones, decides their types and processes their contents.
process zones on an ignore background. Draw a process zone to enclose columns of text to have them handled automatically. They will be decolumnized in the Text Editor’s NF view and RFP view, but kept in columns in True Page view. Ignore zone (olive) Use this to draw an ignore zone, to define a page area you do not want transferred to the Text Editor. Auto-zoning will not place zones here. To exclude a given page area from many pages (for example a header or page numbers), place an ignore zone in a template.
Chapter 3 Working with zones The Image toolbar provides zone editing tools. One is always selected. When you no longer want the service of a tool, click a different tool. Some tools on this toolbar are grouped. Only the last selected tool from the group is visible. To select a visible tool, click it. To select a hidden tool, hold down the mouse button on the triangle at the bottom right of the visible tool until the additional tools appear, then click the tool you want.
Join two zones of the same type Draw an overlapping zone of the same type. existing zones new zone resulting zone Make an irregular zone by subtraction Draw an overlapping zone of the same type as the background (in this example, on an ignore background). existing zone on an ignore background resulting zone new ignore zone Split a zone Draw a splitting zone of the same type as the background (in this example, on a process background).
Chapter 3 The following zone shapes are prohibited: Indented along the bottom Indented along the top Hole in the middle To expand a zone more quickly than using its resizing handles, draw a zone of the same type to completely enclose it. The smaller zone is replaced by the larger one. To replace a set of zones of whatever type with a single zone, draw a larger zone of the desired type to completely enclose them. All the smaller zones are replaced by the larger one.
zone (provided it stays rectangular) to discard unneeded columns or rows from the outer edges of a table. The five grouped table handling tools on the Imaging toolbar can be used if the current page contains a table type zone. If the tool you need is not visible, click the triangle on the bottom right of the visible tool to display all the tools, then click the desired one.
Chapter 3 Using zone templates A template contains a page background value and a set of zones and their properties, stored in a file. A zone template file can be loaded to have template zones used during recognition. Load a template file in the Layout Description drop-down list or from the Tools menu. You can browse to network locations to load templates created by others.
desired. Open the Zone Template Files dialog box. The current template is selected. Click Save and then Close. How to unload a template Select a non-template setting in the Layout Description drop-down list. The template zones are not removed from the current or existing pages, but template zones will no longer be used for future processing. You can also open the Zone Template Files dialog box, select [none] and click the Set As Current button.
Chapter 4 Proofing and editing Recognition results are placed in the Text Editor. These can be recognized texts, tables and embedded graphics.
The editor display and views The Text Editor displays recognized texts and can mark words that were suspected during recognition with wavy underlines: ◆ Green – Non-dictionary words: These were recognized confidently, but are not found in any active dictionary: standard, user or professional. ◆ Blue – Words with suspect characters: These contain unrecognized characters or are dictionary-approved words containing characters recognized with lower confidence.
Chapter 4 True Page view True Page® view tries to conserve as much of the formatting of the original document as possible. Character and paragraph styling is retained. All page elements, including columns, are placed in boxes and frames. Reading order can be displayed by arrows. See from page 73. The formatting level for export is chosen separately at export time. Proofreading OCR results After a page is recognized, the recognition results appear in the Text Editor.
3. If the recognized word is correct, click Ignore or Ignore All to move to the next suspect word. Click Add to add it to the current user dictionary and move to the next suspect word. 4. If the recognized word is not correct, modify the word in the Edit panel or select a dictionary suggestion. Click Change or Change All to implement the change and move to the next suspect word. Click Add to add the changed word to the current user dictionary and move to the next suspect word. 5.
Chapter 4 Verifying text After performing OCR, you can compare any part of the recognized text against the corresponding part of the original image, to verify that the text was recognized correctly.
You should proofread and verify texts before doing large-scale editing. If you cut and paste large blocks of text, the links between text and image may be disturbed. You can use OmniPage Pro’s Text-to-Speech facility to have the recognized text read aloud as another way of verifying text. You can hear the text letter-by-letter, word-by-word, line-by-line, sentence-by-sentence or in whole pages. See the section “Reading text aloud” on page 76.
Chapter 4 Languages The program can read over 110 languages with three alphabets: Latin, Greek and Cyrillic. See the list in the OCR panel of the Options dialog box. It shows which languages have dictionary support. A listing is also provided on the ScanSoft web site. In addition to user dictionaries, specialized dictionaries are available for certain professions (currently medical, legal and financial) for some languages. See the list and make selections in the OCR panel of the Options dialog box.
You can use training to improve recognition of special symbols such as @, ® and © or to recognize supported accented letters more reliably. The purpose of training is not to teach the program to read characters from non-supported languages or alphabets. OmniPage Pro 14 offers two types of training: manual training and automatic training (IntelliTrain). Data coming from both types of training are combined and available for saving to a training file.
Chapter 4 scanning settings, the horizontal line in e can become very thin, leading to OCR errors that IntelliTrain can repair. OmniPage Pro read this as bcnefit. You changed it during proofing to benefit. IntelliTrain remembers this shape and the rule: e This is not c. This is e. IntelliTrain changes: thcrc to there likc to like Whcncvcr to Whenever etc. IntelliTrain remembers the training data it collects, and adds it to any manual training you have done.
This appears if you load an OPD with an embedded training file. You can edit it and also save it to a new named training file. default location, but you can specify a different path, for instance on a local network, to share training files with other users. Click this to edit the selected training file in the Edit Training dialog box. Select this, click Save and type in a name to save a new training file. Use this also to save new training into a loaded training file.
Chapter 4 Text and image editing OmniPage Pro has a WYSIWYG Text Editor, providing many editing facilities. These work very similarly to those in leading word processors. Editing character attributes In all views except No Formatting view, you can change the font type, size and attributes (bold, italic, underlined) for selected text. Use the Formatting toolbar or the Font dialog box from the Format menu. The latter also offers subscripts, superscripts and colored text or backgrounds.
activate the image editor associated with BMP files in your Windows system, and load the graphic. Edit the graphic, then close the editor to have it re-embedded in the Text Editor. Do not change the graphic’s size, resolution or type, because this will prevent the re-embedding. Tables Tables are displayed in the Text Editor in grids. Move the cursor into a table area. It changes appearance, allowing you to move gridlines. You can also use the Text Editor’s rulers to modify a table.
Chapter 4 elements to be modified. You can also group elements into frames or multicolumn areas. Reading order can be displayed and changed. Click the Show reading order tool in the Formatting toolbar to have the order shown by arrows. Click again to remove the arrows. Click the Change reading order tool for a set of reordering buttons in place of the Formatting toolbar. Context-sensitive help explains their use, as does Reading order in online Help. A changed order is applied in NF and RFP views.
Reading text aloud The ScanSoft RealSpeakTM speech facility is provided for the visually impaired, but it can also be useful to anyone during text checking and verification. The speaking is controlled by movements of the insertion point in the Text Editor which can be mouse or keyboard driven. To hear text: Use these keys: One character at a time, forward or back Right or left arrow. Letter, number or punctuation names are spoken.
Chapter 4 The three basic speech keys are grouped together on the numeric keypad.
Proofing and editing
Chapter 5 Saving and exporting Once you have acquired at least one image for a document, you can export the image(s) to file. Once you have recognized at least one page, you can export recognition results – a single page, selected pages or the whole document – to a target application by saving to file, copying to Clipboard or sending to a mailing application. Saving as an OmniPage Document is always possible.
A document remains in OmniPage Pro after export. This allows you to save, copy or send its pages repeatedly, for example with different formatting levels, using different file types, names or locations. You can also add or re-recognize pages or modify the recognized text. With automatic processing and in Batch Manager jobs, you specify the first saving destination before processing starts. When the last available page is recognized (or proofread, if that was requested), an exporting dialog box appears.
Chapter 5 Saving original images You can save original images to disk in a wide variety of file types. See “File types for opening and saving images” on page 115. 1. Choose Save to File in the Export Results drop-down list. In the dialog box that appears, select Image under Save as. 2. Choose a folder location and a file type. Type in a file name. 3. Select to save the selected zone image(s) only, the current page image, selected page images or all images in the document.
Saving recognition results You can save recognized pages to disk in a wide variety of file types. See “File types for saving recognition results” on page 116. 1. Choose Export Results... in the File menu, or click the Export Results button in the OmniPage Toolbox with Save to File selected in the drop-down list. 2. The Save to File dialog box appears. Select Text under Save as. Select this first. It determines which other options are available.
Chapter 5 5. Click OK. The document is saved to disk as specified. If Save and Launch is selected, the exported file will appear in its target application; that is the one associated with the selected file type in your Windows system or in the advanced saving options for your selected file type converter.
Retain Fonts and Paragraphs (RFP) This exports decolumnized text with font and paragraph styling, along with graphics and tables. This is available for nearly all file types. Flowing Page (FP) This keeps the original layout of the pages, including columns. This is done wherever possible with column and indent settings, not with text boxes or frames. Text will then flow from one column to the other, which does not happen when text boxes are used.
Chapter 5 Selecting converter options Click the Converter Options... button in a saving dialog box to have precise control over the export. This brings up a dialog box with the name of the current file type. It presents a series of options tailored to this file type. First, confirm or change the formatting level, because this influences which other options are presented. Select options as desired. Online Help details how to do this. Click Apply to have the changed settings applied to the current save only.
Custom converters are useful for repeated tasks, such as publishing a weekly magazine. Then all recognized pages can be exported with their formatting tailored to their intended use. You can also create a set of customized converters for a given file type defining saving options for each output formatting level, for example: RTF No Formatting, RTF Retain Fonts and Paragraphs, RTF Flowing Page and RTF True Page. You can change converter options without saving anything to file.
Chapter 5 If you later change the saving options for the simple HTML converter, these changes will also be applied in the multiple converter. To make an independent multiple converter, using the same example, select the simple HTML converter and make a new simple converter from it, naming it for instance ‘HTML for multiple’. Similarly, make a converter ‘WordPad for multiple’. Then make a new multiple converter from the two user-defined simple converters.
Saving to PDF You have five choices when saving to Portable Document Format (PDF) files. The first four are presented as Text converters, the last one is listed among the Image converters. PDF (Normal): Pages are exported as they appeared in the Text Editor in True Page view. The PDF file can be viewed and searched in a PDF viewer and edited in a PDF editor. PDF Edited: Use this if you have made significant editing changes in the recognition results.
Chapter 5 Converting from PDF OmniPage Pro 14 Office is supplied with a separate program from ScanSoft: the PDF Converter for Microsoft Word. This allows you to convert PDF files into Word documents quickly and easily. Once OmniPage Pro is installed, PDF becomes available as a file type in the Microsoft Word File Open dialog box. In most cases the conversion can be done without invoking OmniPage Pro.
Copying to Clipboard is not available in workflows or jobs. You can perform a copy and paste operation for the current zone by drag-anddrop. Use the Select zone tool to select a zone. Then drag the cursor from the Image Panel to a target application with an open document. The zone contents will be pasted at the cursor position. OCR runs if necessary.
Chapter 5 1. Choose what you want to send: Text, Image or Multiple. Text is for recognized pages, Image for page images, Multiple to save to two or more file types at once. See “Using multiple converters” on page 86. 2. Specify a file type, a page range, a formatting level and attachment options: one attachment for all pages, one attachment per page, new attachment at each blank page or one attachment for each input file. Set all options and click OK. 3. The eMail Properties dialog box appears.
Other export targets In OmniPage Pro 14 Office you can export files to other targets. You can save files to a central server (an FTP site) or to Microsoft SharePoint. Exporting choices are made in the Export Options dialog box as shown on the previous page. When you click OK you are directed to FTP or SharePoint log-in and invited to specify the required path. If an ODMA-compliant Document Management System (DMS) is detected in your computing environment, it will be offered.
Chapter 6 Workflows Workflows contain a series of processing steps along with their settings that can be saved for future use. This makes them useful for handling recurring tasks efficiently. They process whole documents using the page order supplied as input. They often perform tasks in parallel, for instance recognizing a page while the following page is being loaded. Batch Manager jobs are closely related to workflows. Both are created and modified with the Workflow Assistant.
Workflows A workflow contains a series of processing steps and their settings. It can be saved for repeated use whenever you have a task needing the same processing. Workflows must begin with one and only one input step. But after that, they do not have to conform to the traditional 1-2-3 processing pattern. Usually a workflow will include a recognition step, but this is not compulsory. For instance, page images can be saved to image files in a different file type or to an OmniPage Document.
Chapter 6 2. To PDF and RTF ◆ Input is taken from file with run-time prompting for file names. Original resolutions not to be kept for color and grayscale pages. ◆ Stop for manual zoning. (When running the workflow, use the Document Ready button in the Toolbox to continue.) ◆ Recognition in English optimized for accuracy rather than speed. ◆ Stop for editing and proofing. (When running the workflow, use the Document Ready button in the Toolbox to continue.
4. From OPD to Word and TIFF ◆ Input is from OPD with run-time prompting for name and path. The idea is to open the OPD generated by workflow 3. ◆ The OPD is presented for proofing and editing. (At run-time, the Document Ready button signals that proofing is finished). ◆ Save as OmniPage Document back to its original location and name to overwrite the previous unproofed version. ◆ Save recognized pages to Word 2000 and page images to TIFF with the pre-defined multiple converter “Word and TIFF”.
Chapter 6 to most program functions while the workflow is running. To stop the workflow before it completes, press the Stop button. 4. If run-time input selection is specified, the Load Images dialog box awaits your choice of files. 5. If you requested a step requiring interaction (manual zoning or proofing) the program presents pages for attention. 6. When a page is zoned or proofed, click the Page Ready button in the Toolbox to move to the next page. 7.
formats using default settings: Word, Excel, PDF, TXT and WordPerfect. Only workflows with run-time prompting for input files are listed here. Pressing Stop while a workflow is running pauses it. Click Start to resume processing. If you pause a workflow, maybe do some manual processing, and then save the document as an OmniPage Document, when you later open that OmniPage Document, the interrupted workflow will use the OPD as input and finish the processing.
Chapter 6 The starting point for your workflow will be an existing one. Workflow diagram: The series of steps in your chosen workflow appear here. This lists your workflows. Select one to see its steps in the panel on the right. When you are satisfied with your choice, click Next. If you selected a workflow or job as source, you proceed by modifying its steps and settings. See the next section.
After defining the input settings, click Next to choose your second step. The screen looks like this: Choose your next step: Recognize Images: Send document pages to OCR with auto-zoning. Zone Images Manually: Choose this to see document pages before recognition and draw zones on them. Click Next. The recognition step will be offered again. Apply Zone Template: Click Next to specify the template name. Both manual zoning (in addition to template zones) and recognition will be offered again.
Chapter 6 Modifying workflows Select the workflow you want to modify in the Workflow drop-down list and click the Workflow Assistant button in the standard toolbar. Or choose Workflows... in the Tools menu, select the desired workflow and click Modify... . The first panel of the Workflow Assistant appears with the workflow loaded. Click the icon in the workflow diagram that represents the step you want to modify. Click Next to move to the settings panel relating to that step. Make the desired changes.
The Options dialog box in the Batch Manager is in the Tools menu. Its General panel has an option Start Batch Manager scheduler at system startup. By default it is on. It must remain selected for jobs to run at their scheduled time. The option is provided so it is possible to prevent all jobs from running without having to disable them individually. Its state also governs the running of barcode driven workflows.
Chapter 6 Modifying jobs Jobs with status Not scheduled or Completed can be modified. Select the job in the left panel of the Batch Manager and choose Modify from the Edit menu. The Workflow Assistant appears with the job steps and settings loaded. Make the desired changes as already described for workflows. See “Modifying workflows” on page 101. Finally, modify timing instructions as desired, retain the job name and click Finish. Managing and running jobs This is done with the Batch Manager.
Click on a job and a page-by-page analysis of all pages in the job appears in the right panel. It shows where input was taken from, the page status and where output was directed to. Click on a plus icon to see more information about the page. For jobs with the error or warning status, the listing shows which pages failed or what problems occurred. Start Job in the File menu serves to start a Not Scheduled or Completed job immediately.
Chapter 6 [ Click Folders to add a watched folder to the list using the Browse for Folder dialog box. Select this to make all the listed folders into watched folders. Select this to see file lists in the opened folders. Specify a folder and an image file type. When you reach the last panel of the Workflow Assistant, you set the timing instructions: a starting time and an end time for the watching to occur.
Barcode driven workflows In OmniPage Pro 14 Office, you can run workflows without specifying the workflow name inside the program. You do this using barcode cover pages that define which workflow should run. Workflows with image file input are handled using special watched folders. In these cases, cover page starting is available only for workflows that ask for run-time specification of file names, because this is what the cover page provides.
Chapter 6 folders with no starting or end time; they are always monitored, so long as Start Batch Manager scheduler at system startup remains selected in the General panel of the Options dialog box. The starting time for the workflow is defined by the moment the cover page enters a barcode folder. Second, optionally define a set of folders as the locations that should be searched to find the workflow files (extension .xwf ) referenced by the cover pages.
dialog box. You may need to change this temporarily if you have to use a different voice recognition language. Voice activation is applied to two fields. Workflow activation To do this, the workflow icon must be visible on the Windows taskbar. Use the Start menu or the General panel of the Options dialog box to place it there if it is not. When your ASR-1600 system is operational, just say OmniPage workflows. This will activate the shortcut menu.
Chapter 7 Technical information This chapter provides troubleshooting and other technical information about using OmniPage Pro 14. Please also read the online Readme file and other help topics, or visit the ScanSoft web pages. Its scanner section contains detailed and regularly updated information about scanner setup and support. The Readme file contains last-minute information relating to OmniPage Pro. Access to the Readme file and to ScanSoft’s web pages is provided in the Help menu.
Troubleshooting Although OmniPage Pro is designed to be easy to use, problems sometimes occur. Many of the error messages contain self-explanatory descriptions of what to do – check connections, close other applications to free up memory, and so on. Sometimes that is all the troubleshooting help you need. Please see your Windows documentation for information on optimizing your system and application performance.
Chapter 7 Testing OmniPage Pro Restarting Windows 98, Me, 2000, XP or 2003 Server in its safe mode or Windows NT in VGA mode allows you to test OmniPage Pro on a simplified system. This is recommended when you cannot resolve crashing problems or if OmniPage Pro has stopped running altogether. See Windows online Help for more information. Your scanner will not run with OmniPage Pro in safe mode or VGA mode, so do not test scanner problems in this configuration. ▼ To test OmniPage Pro in safe mode: 1.
5. Launch OmniPage Pro and try performing OCR on an image. Use a known image file such as one of the supplied sample files. You can also run OmniPage Pro from a command line in its own safe mode. Choose Start !Run, browse for the file OmniPage.exe and add the command line option /safe. This starts the program, but ignores previously stored settings and does not try to recover a document from an abnormal termination. Increasing memory resources OmniPage Pro may run poorly under low-memory conditions.
Chapter 7 Text does not get recognized properly Try these solutions if any part of the original document is not converted to text properly during OCR: ◆ Look at the original page image and ensure that all text areas are enclosed by text zones. If an area is not enclosed by a zone, it is generally ignored during OCR. See the section on creating and modifying zones, “Working with zones” on page 57. ◆ Make sure text zones are identified correctly.
OmniPage Pro only recognizes machine printed-text characters such as typewritten or laser-printed text. It can handle dot-matrix characters, though accuracy may be lower on draft-quality texts. It cannot read handprint or handwriting. However, it can retain signatures or other handwritten text as a graphic. Problems with fax recognition Try these solutions to improve OCR accuracy on fax images: ◆ Ask senders to use clean, original documents if possible.
Chapter 7 Supported file types The program supports a wide range of file types for images and text.
If you try to save a black-and-white image to JPEG format, the program will offer conversion to grayscale. With TIFF G3 and G4 it will offer conversion to blackand-white. Saving to PDF format is supported, with five options. Two of these, Image only and Image on text, export original images. To save image-only PDFs, choose ‘Image’ in the saving dialog box. To save to other PDF types, choose ‘Text’. See “Saving to PDF” on page 88.
Chapter 7 Graphics ● File type supports graphics. ●● File type supports graphics, with export choice to retain or drop graphics. Tables ● File type supports tables in grids, no table handling choices at export time. ●● File type supports tables, choose to use grids or tab separated columns. ❍● File type does not support table grids, choose to convert to tab or space separated columns. 1 These saving file types are available only in OmniPage Pro 14 Office.
Uninstalling the software Sometimes uninstalling and then reinstalling OmniPage Pro will solve a problem. OmniPage Pro’s Uninstall program will not remove files containing recognition results or any of the following user-created files: Zone templates (*.zon) Training files (*.otn) User dictionaries (*.ud) OmniPage Documents (*.opd) Job files (*.xjf) Workflow files (*.xwf) To uninstall from Windows NT, 2000, XP or Windows Server 2003 you must be logged into your computer with administrator privileges.
I N D E X A C D Accuracy improvement, 49, 69, 113 influence of brightness, 50 influence of training, 69 scanning mode influence, 49 Acquire Text menu items, 45 Acquired pages, 26 Acquiring images, 21, 40 Activation of workflows by voice, 107 Adding characters for recognition, 31 pages to a document, 39 to zones, 58 training to training files, 72 words to a user dictionary, 66 ADF, 31, 48, 50 Advanced saving options, 85 Advice on problems, 110 Alphanumeric zones, 55 ASR-1600, 13, 107 Attachments to mail
E Editing character attributes, 73 graphics, 73 in True Page, 74 on-the-fly, 75 paragraph attributes, 73 PDF output, 88 recognized text, 73 tables, 59, 74 training files, 72 user dictionaries, 68 Effect of settings, 32 E-mail notification, 97, 100 Embedding files in OPDs, 30 Embedding templates in OPDs, 61 Enabling individual characters, 31 Enabling OmniPage taskbar icon, 97 Encrypted PDF files, 88 Error messages for jobs, 103 Examples of training, 69 Export converters, 85 Export Results button, 39, 41, 82
training files, 72 zone templates, 52, 61 Location for new pages, 32 M Mail, 39, 90 sending page images, 91 Managing documents, 26 Managing documents with PaperPort, 46 Managing jobs, 103 Manual processing, 25, 40 Manual training, 70 Manual zoning, 40, 53 Marked words in Text Editor, 64 Markers, 64, 66 Medical dictionaries, 66 Memory requirements, 12, 112 Menu bar, 23 Microsoft Word, opening PDF files in, 89 Minimum system requirements, 12 Modified pages, 26 Modifying jobs, 103 workflows, 101 zone template
with workflows, 43, 96 Professional dictionaries, 66 Prompt to save training data, 29 Proofed pages, 26 Proofing in later sessions, 29 options, 32, 65 Proofing by voice, 107 Proofing in a workflow, 97 Proofreader dialog box, 65 Proofreading OCR results, 65 Properties of zones, 55 Purpose of OPD files, 30 Purpose of training, 69 Purpose of workflows, 94 Q Quality of images, 50 Quick Start Guide, 34 R Reading order of image files, 48 text aloud, 76 Reading order, 75 RealSpeak, 13, 76 Recognition accuracy, 3
Thumbnails, 22, 24, 26 TIFF image files, 115 Timing instructions for jobs, 105 Toolbar docking and floating, 23, 67 Toolbars, 23 Training, 69 automatic, 70 creating training data, 72 editing training files, 72 IntelliTrain, 70 loading training files, 72 manual, 70 prompt to save data, 29 saving training files, 72 training files, 72 unloading training files, 72 unsaved training data, 29 Troubleshooting, 109, 110 True Page, 24 True Page editing, 74 True Page export, 84 True Page view, 65 TWAIN drivers for sca
THIRD PARTY LICENSES/NOTICES LZW licensed from Unisys Corporation under U.S. Patent No. 4,558,302 and foreign counterparts. The Independent JPEG Group's software, copyright © 1991-1995, Thomas G. Lane. This software is based, in part, on the work of the Independent JPEG Group, Colosseum Builders, Inc., the FreeType Team, and Catharon Productions, Inc. Zlib copyright © 1995-1998 Jean-loup Gailly and Mark Adler.