OmniPage Pro ® User’s Manual CAERE CORPORATION 100 Cooper Court Los Gatos, California 95032-7603 USA
Caere GmbH Innere Wiener Strasse 5 81667 München, Germany Caere UK Information Centre Abbey House 4 Abbey Orchard Street Westminster, London SW1P 2JJ Centre d’informations Caere 72, rue Baratte-Cholet 94100 Saint-Maur, France Please Note To use this program, you should know how to work in the Microsoft Windows environment. Please refer to Windows documentation if you have questions about how to use menu commands, dialog boxes, scroll bars, edit boxes, and so on.
Table of Contents Welcome Using This Manual ............................................................................................................viii Chapter 1 Installation and Setup Minimum System Requirements.........................................................................................2 Installing OmniPage Pro .....................................................................................................2 Starting and Closing OmniPage Pro..........................................
Performing OCR on a Document .....................................................................................23 Proofreading OCR Results ................................................................................................24 Verifying Text ..............................................................................................................25 Proofreading OCR Results in Microsoft Word .......................................................25 Using OCR in Other Applications ..........
Specifying Fonts...................................................................................................................74 Training OCR for Special Characters ...............................................................................75 Creating User Dictionaries .................................................................................................77 Saving Settings Files............................................................................................................
vi
Welcome Welcome to OmniPage Pro, and thank you for using our software! The following documentation has been provided to help you learn about OmniPage Pro. This User’s Manual This manual introduces you to the basics of using OmniPage Pro. It includes installation and setup instructions, an introduction to OmniPage Pro, task-oriented instructions, ways to customize processing, settings guidelines, and technical information. This manual is also available as an electronic PDF file.
Using This Manual Using This Manual This manual is written with the assumption that you know how to work in the Microsoft Windows environment. Please refer to your Windows documentation if you have questions about how to use dialog boxes, menu commands, scroll bars, drag and drop functionality, shortcut menus, and so on. The following conventions are used in this manual.
Chapter 1 Installation and Setup This chapter provides installation and setup information for OmniPage Pro and the Scan Manager. For technical and troubleshooting information, please read Chapter 6, Technical Information. For information on supported scanners and scanner setup, read the Scanner Setup Notes. To open this PDF file, click Start in the Windows taskbar and choose Programs Caere Applications Caere Documents Scanner Setup Notes after OmniPage Pro has been installed.
Minimum System Requirements Minimum System Requirements You need the following setup, at minimum, to install and run OmniPage Pro: • Computer with a 486 or higher processor • Microsoft Windows 95, Windows 98, or Windows NT 4.
Starting and Closing OmniPage Pro To install OmniPage Pro: 1 Insert OmniPage Pro’s CD-ROM in the CD-ROM drive. The Setup program should start automatically. If it does not start, locate your CD-ROM drive in Windows Explorer and double-click the Setup.exe program at the top-level of the CD-ROM. 2 Follow the instructions on each screen to install the software. During installation, you may be prompted to enter a serial number. You can find your serial number on the label of the CD-ROM envelope.
Starting and Closing OmniPage Pro OmniPage Pro’s desktop appears when you open OmniPage Pro. See “The OmniPage Pro Desktop” on page 10 for an introduction to OmniPage Pro’s user interface. Standard toolbar Zone toolbar AutoOCR toolbar The thumbnail viewer displays the pages in an open document. The image viewer displays the current page’s original image. The text viewer displays the current page’s recognized text and retained graphics.
Registering OmniPage Pro Registering OmniPage Pro Register your copy of OmniPage Pro with Caere Corporation to receive notification of special offers and the best prices on product upgrades. Some versions of OmniPage Pro will only launch 25 times if you do not register it. If you purchased your product directly from Caere or if you were previously registered, you may not need to register again. Your version of OmniPage Pro will not display a Register menu if you do not need to register it.
6 Chapter 1
Chapter 2 Introduction to OmniPage Pro You probably use your computer for most business correspondence and other written projects. The challenge is that certain sources of information cannot be immediately used on a computer. For example, if you want to incorporate information from a magazine article into a document in your word processor, you somehow have to get the text from the article into your computer. Painstakingly retyping the article is not an appealing solution.
What Is Optical Character Recognition (OCR)? What Is Optical Character Recognition (OCR)? Optical character recognition (OCR) is the process of turning an image into computer-editable text. An image is an electronic picture of text such as a scanned paper document or an electronic fax file. Images do not have editable text characters; they have many tiny dots (pixels) that together form a picture of text. During OCR, OmniPage Pro analyzes an image and defines characters to produce editable text.
What Is Optical Character Recognition (OCR)? Basic Steps of OmniPage Pro OCR These are the basic steps of OmniPage Pro’s OCR process. 1 Bring a document image into OmniPage Pro. You can scan a paper document or load an image file. The resulting image appears in OmniPage Pro’s image viewer. See “Bringing Document Images into OmniPage Pro” on page 20 for more information. 2 Create zones to identify areas you want to recognize as text or retain as graphics.
The OmniPage Pro Desktop The OmniPage Pro Desktop OmniPage Pro’s desktop displays the pages of an open document in its thumbnail viewer, image viewer, and text viewer. You can use buttons in the Standard, AutoOCR, and Zone toolbars to perform various tasks on the document. Standard toolbar Zone toolbar AutoOCR toolbar The thumbnail viewer displays a picture of each page in the document. The current page is highlighted with a light border around it.
The OmniPage Pro Desktop AutoOCR Toolbar The AutoOCR® toolbar contains buttons that can activate each step of the OCR process. AUTO button Image button Zone button OCR button Export button Click the down arrow to display the commands in a button’s drop-down list. You can set different commands in the AutoOCR toolbar buttons for the operations you want to perform. Choose a command using each buttons’s drop-down list. • The AUTO button allows you to activate automatic processing or use the OCR Wizard.
The OmniPage Pro Desktop Standard Toolbar The Standard toolbar contains buttons and a drop-down list for performing standard tasks. New Open Save Proofread OCR Print Copy Cut Undo Paste Image Editor View Rotate Image Zoom Options Straighten Image Help Zone Toolbar The Zone toolbar contains buttons that allow you to draw and define zones on a page image.
The OmniPage Pro Desktop Options Dialog Box You can select settings for OmniPage Pro in the Options dialog box. To open it, click the Options button or choose Options... in the Tools menu. Click the tabs in the Options dialog box to view and select different settings. See Chapter 4, OmniPage Pro Settings, for more information on settings.
Getting Online Help Getting Online Help In addition to using this manual, you can use OmniPage Pro’s online Help topics to learn about features, settings, and procedures. Online Help is available after you install OmniPage Pro. OmniPage Pro’s online Help follows the conventions of Microsoft Windows 95 Help. Choose How to Use Help... in OmniPage Pro’s Help menu to get information on using Help. Help Menu One way to open OmniPage Pro’s online Help is to choose commands in the Help menu.
Getting Online Help Context-Sensitive Help You can get on-the-spot information about a particular OmniPage Pro command, toolbar button, or dialog box option in the following ways: • Click the Help button in the Standard toolbar and then click any toolbar button, menu command, or area of the OmniPage Pro desktop to display a Help topic explaining that item.
Product Support Product Support For the fastest and easiest way to get help, please look for solutions in this manual or in the online Help. See “General Troubleshooting Solutions” on page 86 for troubleshooting tips. If you need additional help, please use the following resources: • Caere’s World Wide Web site Go to Caere’s World Wide Web site for common questions and answers, updates, patches, troubleshooting procedures, and product information. Caere’s Web site address: http://www.caere.
Chapter 3 Processing Documents This chapter describes how to work with documents in OmniPage Pro, including each step of the OCR process. There are different ways to accomplish the same tasks in OmniPage Pro. You can use toolbar buttons or menu commands to start procedures. OmniPage Pro can perform all OCR steps automatically, or you can start each step individually. You can even do different tasks at the same time.
Ways to Process Documents Ways to Process Documents Optical character recognition (OCR) is the process of turning an image into computer-editable text so you do not have to retype the text manually. The basic steps of OmniPage Pro’s OCR process are explained on page 9. The following is a summary of those steps. 1 Bring a document image into OmniPage Pro. See page 20 for more information. 2 Create zones to identify areas you want to recognize as text or retain as graphics.
Ways to Process Documents Automatic Processing Use the AUTO button to process a new document from start to finish or to finish processing an open document. To process your document automatically: 1 Set AutoOCR as the command in the AUTO button’s dropdown list. 2 Set the desired Image, Zone, OCR, and Export commands. See “Setting AutoOCR Toolbar Commands” on page 40 for more information. 3 Choose Options... in the Tools menu and check that settings are appropriate for your document.
Bringing Document Images into OmniPage Pro Bringing Document Images into OmniPage Pro You can bring document images into OmniPage Pro by scanning pages or loading image files. Scanning Pages You can scan paper documents to convert them to electronic images in OmniPage Pro. If a document is already open, scanned pages are inserted as new pages. To scan in OmniPage Pro, you must install the Scan Manager and select your default scanner.
Bringing Document Images into OmniPage Pro To load image files into OmniPage Pro: 1 Set Load Image as the command in the Image button’s dropdown list. 2 Click the Image button or choose Load Image in the Process menu. The Load Image dialog box appears. Click Advanced if you want to select files from more than one folder. 3 Select the folder location and file type of the file you want to load. See “Supported File-Format Types” on page 89 for a complete list of supported file formats.
Creating Zones for OCR Creating Zones for OCR Page images are displayed in OmniPage Pro’s image viewer where zones are created before OCR. Zones are borders that identify areas of an image that will be recognized as text or retained as graphics. Any part of an image not enclosed by a zone is ignored during OCR. This is a table zone. It will be kept in a row-andcolumn format during OCR. These are text zones. They will be converted to text during OCR. This is an unzoned area. It will be ignored during OCR.
Performing OCR on a Document 2 Click the Zone button or choose Auto Zones in the Process menu. OmniPage Pro automatically draws zones on the current page in the image viewer. Each zone has a number indicating its order and a picture indicating its zone type. Make sure zones are identified correctly before performing OCR. For example, if you want to retain an area as a graphic, that area should be identified as a Graphic zone type. See “Changing Zone Properties” on page 71 for more information.
Proofreading OCR Results To schedule a group of documents for OCR at a particular time, see “Scheduling OCR” on page 80. Proofreading OCR Results After performing OCR, recognized text appears in the text viewer where you can proofread the results. Proofreading starts automatically if you chose OCR and Proof as the OCR process command. OmniPage Pro marks suspected errors in green and inserts a red “reject” character for any character it cannot recognize.
Proofreading OCR Results • Click Add to add the word to the current user dictionary. After you choose an option for the word, the OCR Proofreader looks for the next possible error. 3 Click Close to stop proofreading OCR. Color markers are removed from words that have been proofread. Verifying Text After performing OCR, you can compare recognized text against the original image to verify that the text was recognized correctly.
Proofreading OCR Results 2 Make sure the *.doc file extension is associated with the version of Word you plan to use. Refer to your Windows documentation for more information on associating file extensions with applications. To proofread OCR results and correct errors in Microsoft Word: 1 Perform OCR on your document and then save it as the appropriate file type: • Save as Word for Windows 7.0, 95 if you are using that version. • Save as Word 97 if you are using that version.
Proofreading OCR Results The OCR Proofreader dialog box also appears. 4 Select one of these options for the word: • Click Ignore to allow the word to remain as is. • Click Ignore All to ignore all instances of the word. • Click Change to replace the word with the word in the Change to edit box. • Click Change All to replace all instances of the word with the word in the Change to edit box. • Click Add to add the word to the current user dictionary.
Proofreading OCR Results You can only verify words that are marked as suspected errors. However, once the Verify Text window is open, you can use its scroll bars and zoom buttons to see any part of the original image. 3 Choose Verify Text... in the OmniPage menu. The Verify Text window opens and shows a picture of the original word and its surrounding area. 4 Repeat steps 2 and 3 to continue proofreading other suspect words. The display changes as you select new words.
Using OCR in Other Applications Using OCR in Other Applications You can use OmniPage Pro's OCR Aware feature to use OCR in other applications. For example, you can scan, recognize, and paste text directly into a document without ever leaving your word-processing application. You can use OCR Aware with 32-bit applications that have been registered with OmniPage Pro. An application must be installed on your computer in order to use it with OCR Aware.
Working with Documents Working with Documents OmniPage Pro’s thumbnail, image, and text viewers allow you to look at and work with pages in the current document. Thumbnail viewer Image viewer Drag this splitter to the left or right to resize a view.
Working with Documents Resizing a Page View You can resize a page displayed in the image viewer or text viewer to enlarge or reduce the view. To resize a page view: 1 Click in the viewer you want to enlarge or reduce to make it active. 2 Choose a size option in the Zoom drop-down list in the Standard toolbar. Or, choose Zoom in the View menu and select a size option in the drop-down list. The page resizes as specified.
Working with Documents • Click the Next Page or Previous Page buttons at the lower-right corner of the OmniPage Pro desktop. • Choose Next Page, Previous Page, or Go to Page... in the Edit menu. Reordering Pages You can reorder pages in a document by dragging their thumbnails to different positions in the thumbnail viewer. Click the thumbnail of the page you want to move and drag it above the desired page number.
Working with Documents Undoing Changes You can click the Undo button or choose Undo in the Edit menu to cancel the very last change you made in the text viewer. You can also choose Undo to cancel zone edits in the image viewer. However, page deletions cannot be undone. Printing a Document You can print the current document's original page images or recognized text. To print a document: 1 Choose Print... in the File menu and choose one of the following in the submenu: • Choose Image...
Exporting Documents Exporting Documents You can export a document to other applications by: • Saving a Document • Copying a Document to the Clipboard • Sending a Document as a Mail Attachment After you export a document, a copy of the document remains open in OmniPage Pro. Save the document as an OmniPage Document (*.met) if you want to reopen it in OmniPage Pro again. OmniPage Documents retain all original images, zones, and recognized text.
Exporting Documents 4 Click OK. The document is saved to disk as specified. Graphics and formatting are saved in the document only if the selected file type supports them. To save original images: 1 Choose Save Image... in the File menu. The Save Image dialog box appears. 2 Select a folder location and file type for your document. See “Supported File-Format Types” on page 89 for a complete list of supported file types. 3 Type in a file name and select Save and Image options. 4 Click OK.
Exporting Documents Saving a Document as You Work Click the Save button in the Standard toolbar or choose Save in the File menu to save changes to the current document as you work. The Save As dialog box appears the first time you choose Save if a document has not been saved as an OmniPage Document or text-based file. See “Saving a Document” on page 34 for more information. If a document has been saved as an OmniPage Document (*.
Exporting Documents Sending a Document as a Mail Attachment You can send a recognized document as a file attached to a mail message if you have a MAPI-compliant mail application, such as Microsoft Outlook, installed. To send a document as a mail attachment: Processing Documents 1 Choose Send Mail... in the File menu. You can also click the Export button with Send Mail selected in the drop-down list. The Send Mail dialog box appears. 2 Specify a file type and attachment options for your document.
38 Chapter 3
Chapter 4 OmniPage Pro Settings This chapter describes the settings in the AutoOCR toolbar and Options dialog box. Please also look in OmniPage Pro’s online help for more detailed information on settings. The settings you select for processing documents can greatly affect OCR results. You may have to experiment with different settings to get the results you want. Settings guidelines are provided at the end of this chapter to get you started.
Setting AutoOCR Toolbar Commands Setting AutoOCR Toolbar Commands The AutoOCR toolbar buttons allow you to take a document through each step of the OCR process. Every toolbar button has different process commands that can be set for the operations you want to perform. OmniPage Pro can go through all steps automatically, or you can start each step individually.
Setting AutoOCR Toolbar Commands AUTO Button Commands Use the AUTO button to process a new document from start to finish or to finish processing an open document. The AUTO button’s drop-down list contains AutoOCR and OCR Wizard commands. AutoOCR Select AutoOCR to finish processing a new or open document according to the selected process commands. See “Automatic Processing” on page 19 for more information.
Setting AutoOCR Toolbar Commands Zone Button Commands Use the Zone button to automatically create zones on document images. Zones are bordered areas that specify what will be recognized as text or retained as graphics on an image. The Zone button’s drop-down list contains the Single-Column Pages, Multiple-Column Pages, Spreadsheet Pages, and Mixed Pages commands and the names of any zone templates you have created. See “Creating Zones for OCR” on page 22 for more information.
Setting AutoOCR Toolbar Commands OCR Button Commands Use the OCR button to perform the selected OCR operation on document images. The OCR button’s drop-down list contains the Perform OCR, OCR and Proof, Train OCR, and Defer OCR commands. Perform OCR Select Perform OCR to recognize text on document images. During OCR, OmniPage Pro analyzes the image and identifies characters to produce editable text. See “Performing OCR on a Document” on page 23 for more information.
Setting AutoOCR Toolbar Commands Export Button Commands Use the Export button to export recognized text and retained graphics to other applications. The Export button’s drop-down list contains the Save As, Send Mail, Copy to Clipboard, and Defer Export commands. Save As Select Save As to save a recognized document to disk in a specified file format. See “Saving a Document” on page 34 for more information.
Selecting OmniPage Pro Settings Selecting OmniPage Pro Settings Click the Options button or choose Options... in the Tools menu to open the Options dialog box. This is the central location for OmniPage Pro settings. Click each tab to view and select different settings. Click for a description of each setting. Default settings are shown in most examples that follow. However, documents require different settings depending on their input attributes and your output goals.
Accuracy Settings Accuracy Settings Click the Accuracy tab to select settings that affect OCR accuracy. The Language Analyst evaluates and replaces unknown words with words most likely to be correct during OCR. Select the type of characters that are in your document. Training files help recognize special characters during OCR. Usually, these setting should be selected for optimal accuracy. Deselect any that cause over correction.
Page Format Settings Page Format Settings Click the Page Format tab to select settings that determine how the formatting of a page is handled during OCR. Select a setting that best describes how your original page looks. The page icons change to depict the general appearance of your page original. Select a setting to determine what you want your page to look like after OCR. Click to select font options for recognized text Tables Settings Click the Tables tab to select table settings for your document.
Language Settings Language Settings Click the Language tab to select language settings for your document. Select the language that appears most in your document. This is the language that will be used in dialog boxes, windows, and menu commends. Select additional languages for a multilanguage document. You must have installed those languages during installation. This is the character used in place of unknown characters. You can enter your own choice.
Process Settings Some applications may be pre-registered with OCR Aware during OmniPage Pro installation. These applications will display in the Registered list box. To register an application with OCR Aware: 1 Launch the application you want to register and open a document in it. This will ensure that the application name appears in the list box in step 5. 2 Choose Options… in OmniPage Pro’s Tools menu. 3 Click the OCR Aware tab in the Options dialog box.
Microsoft Word Settings Microsoft Word Settings Click the Microsoft Word tab to select settings for OCR proofreading directly in Microsoft Word. See “Proofreading OCR Results in Microsoft Word” on page 25 for more information. Select this if you want to check for OCR errors in Microsoft Word. Select the color in which you want suspected errors to appear in Microsoft Word. Proofreading OCR in Microsoft Word is only supported in Microsoft Word 95, Word 7.0, and Word 97. Make sure you associate the *.
Settings Guidelines Settings Guidelines The settings you select in OmniPage Pro can greatly affect OCR results. Make sure that settings are appropriate for your document before you begin processing. You may have to experiment with different settings to get the results you want. Answer the following questions to get settings recommendations for your documents. Generally, if you indicate the characteristics of your documents to OmniPage Pro, you will receive better OCR results.
Settings Guidelines What type of document are you processing? Magazine and newspaper pages Recommendations • Select Multiple columns in the Page Format settings. • Select the appropriate page size and orientation in the Scanner settings if you are scanning. • Draw zones manually or modify automatically created zones if auto zoning does not successfully create zones around all page areas you want to process. See “Customizing Zones” on page 63, for more information.
Settings Guidelines What type of document are you processing? Text and table Recommendations • In Page Format settings, select Single column or Multiple column page layout depending on the number of columns in your document. • Select the appropriate page size and orientation in the Scanner settings if you are scanning. • If your table has no grid lines, draw a zone around the table, and set its properties to Table, and its content to Numeric, unless it has text headings, then select Alphanumeric.
Settings Guidelines What type of document are you processing? Legal documents Recommendations • Select Single column in the Page Format settings if the document has one, pagewide text column, even if document has pleading-line numbers. • Select Multiple columns in the Page Format settings if text appears in two or more columns. • Select the appropriate page size and orientation in the Scanner settings if you are scanning.
Settings Guidelines What is the quality of the original document? Poor or not sure Degraded photocopies, colored or shaded backgrounds or text, runtogether or broken text characters thick, run-together text characters Recommendations for scanning • Select Grayscale with 3D OCR in the Scanner settings if you have a grayscale scanner and your page contains grayscale graphics, colored background, or colored text. • For best accuracy, use the Black and white setting if your pages are black and white.
Settings Guidelines How much original formatting do you want to keep? Minimal Keep one font and one font size only Recommendations • Select Remove formatting in the Page Format settings. • Click Font Mapping... in the Page Format settings and select the font and size you want mapped. • Select one of the text-only formats in the Save As dialog box if you want to be able to open the document in any text application.
Settings Guidelines How much original formatting do you want to keep? As much as possible Keep font characteristics, paragraph formatting, column formatting and graphic positioning Recommendations • Select True Page in the Page Format settings to retain the original appearance of a page using frames. The formatting will be precise but will be more difficult to edit.
Settings Guidelines Do you want to retain graphics in your document? Yes Keep graphics such as logos and photos during OCR processing Recommendations for scanning • Select Color in the Scanner settings if you are scanning pages with multiple-color graphics and you want to retain the graphics in color. • Select Grayscale with 3D OCR in the Scanner settings if you are scanning with a grayscale scanner and you want to retain grayscale graphics.
Settings Guidelines How many languages are in your document? One language Recommendations • Select the document language as the Main language in the Language settings. If your document contains a language that is not installed in OmniPage Pro, you can add languages to OmniPage Pro by uninstalling and then reinstalling it. • For faster processing and more accurate results, select only the language that appears in your document in the Language settings.
Settings Guidelines Are you processing a multipage document? Yes Recommendations if you have an automatic document feeder (ADF) • Select Scan until empty in the Scanner settings to scan a stack of pages at once. Otherwise, you must click the Image button to scan each subsequent page. • Select Double-sided pages to scan pages with print on both sides. You will be prompted to turn the stack over. • Insert blank (paper) pages to separate more than one job within a stack of pages.
Chapter 5 Customizing OCR OmniPage Pro has many features that allow you to customize the way your documents are handled during OCR. This chapter describes how to use these features.
Adjusting Page Images Before OCR Adjusting Page Images Before OCR You can rotate and straighten page images in OmniPage Pro’s image viewer before zoning and OCR take place. This is recommended to improve OCR accuracy on pages that are not oriented correctly. If you need to rotate or straighten a page, be sure to do so before you create zones because all zones are deleted during these operations. To rotate a page image: 1 Click on the page image to make the image viewer active.
Customizing Zones Customizing Zones Zones are borders created around areas of a page image to identify what will be recognized as text or retained as a graphic during OCR. Zones play a big part in determining OCR results. You can create zones automatically, manually, or with a template.
Customizing Zones Drawing Zones Manually You can draw zones manually on an image using buttons in the Zone toolbar. Rectangular zones are the most common, but you can also draw irregular-shaped zones for graphics and text. Only rectangular (and square) zones are allowed for tables. To draw rectangular zones: 1 Click the Draw Rectangular Zones button. The mouse pointer in the image viewer becomes a drawing tool.
Customizing Zones You will not be allowed to draw a line if it constitutes a restricted shape. The following zone shapes are restricted: Indented along the bottom Indented along the top To draw a table zone: 1 Click the Zone Properties button and select Table zone as the zone type. See “Changing Zone Properties” on page 71 for more information. 2 Click the Draw Rectangular Zones button. The mouse pointer in the image viewer becomes a drawing tool.
Customizing Zones To resize zones: 1 Deselect the buttons in the Zone toolbar. (If one of the first two drawing buttons is selected, you do not have to deselect it.) 2 Select the zone you want to resize by clicking inside it. The selected zone is shaded and handles appear on its border. 3 Place the mouse pointer over a handle so that it changes to a two-way arrow. 4 Hold down the mouse button and drag the handle in the direction that you want to enlarge or reduce the zone.
Customizing Zones 2 Position the drawing tool at the point where you want to start extending the zone. 3 Hold down the mouse button and drag the drawing tool in the direction that you want to extend the zone. 4 Release the mouse button when you are finished extending the zone. The zone border changes to display the modified zone area. drawing tool The left area of this zone has been extended downward. To subtract an area of a zone: 1 Click the Subtract from Zone button.
Customizing Zones 4 Release the mouse button when you are finished subtracting from the zone. The zone border changes to display the modified zone area. Table zones are constrained to rectangular and square shapes. Attempting to modify the area of a table zone to an irregular shape is not allowed. Table zones, however, can be resized, and it is recommended that you resize the table zone as described on “To extend an area of a zone:” on page 66.
Customizing Zones Modifying Table Zones You can modify table zones by moving, resizing, reordering, extending, subtracting zones, and adding or removing table grids. To move dividers in a table zone: 1 Click the Move Row or Column Dividers button. 2 Place the mouse pointer within the table zone in the image viewer. The mouse pointer becomes a vertical- or horizontal-bar tool depending on which divider is being passed over.
Customizing Zones To remove a row or column divider from a table zone: 1 Click the Remove Row or Column Dividers button. 2 Place the mouse pointer within the table zone where you want to remove a row or column. The mouse pointer becomes a small “x” with a dimmed bar. 3 Position the bar on the divider you want to remove and click the mouse button. Ctrl-clicking a column divider will remove only the column divider from a single cell. To remove a row divider, the whole row divider must be removed.
Customizing Zones Deleting Zones You can delete the current zones if you want to create new zones. You can also delete individual zones that you do not want to process during OCR. Any part of a page image not enclosed by a zone is ignored during OCR. To delete and replace the current zones automatically, click the Zone button in the AutoOCR toolbar. You will be prompted to replace the current zones. To delete zones: 1 Select the zone you want to delete by clicking inside the zone.
Customizing Zones Zone Type Every zone on a page has a zone-type setting.
Customizing Zones 2 Click the Zone Properties button to open the Zone Properties dialog box. Close button The settings in this dialog box will be blank if multiple zones with different settings are selected. 3 Select a zone type for the selected zones. If you change an irregular-shaped zone to a Table type zone, OmniPage Pro substitutes the largest rectangle that fully encloses the irregular area. 4 Select a zone content for the selected zones.
Specifying Fonts Specifying Fonts You can retain the font characteristics in your document during OCR if you select an Output Format option other than Remove formatting in the Page Format tab of the Options dialog box. OmniPage Pro automatically maps detected font types to specified fonts. To map fonts, OmniPage Pro analyzes text and categorizes it as one of these font types: • Proportional Serif Character spacing varies depending on the character; short lines finish off the letter strokes.
Training OCR for Special Characters Training OCR for Special Characters A training file is a set of pre-recognized text characters that OmniPage Pro compares with characters on a page image during OCR. You can create a training file for special characters that might normally be difficult to recognize such as the copyright symbol © or the registered trademark symbol ®. To create a training file: 1 Open the image file or scan the page that includes characters you want to train.
Training OCR for Special Characters The Specify Character dialog box shows how the selected character appeared in the original page image. The original image of the selected character Click the character you want to associate with the selected character The associated character appears here 6 Specify how you want OmniPage Pro to interpret the character during OCR by entering a character in the Character edit box. 7 Click OK to return to the Train Characters dialog box.
Creating User Dictionaries The Train Character dialog box displays characters in the selected file. Original image Associated characters 3 Edit the characters as desired. • Double-click a character that you want to edit. • Click a character that you want to remove and click Delete. 4 Do one of the following after editing the training file: • Click Save to save changes in the training file. • Click Append to add all trained characters to another training file.
Saving Settings Files • Select a file and click Edit to edit an existing user dictionary. • Click New to create a new user dictionary. Enter a name in the dialog box that appears and click OK. The User Dictionary dialog box appears. Words in the user dictionary appear in this list box. 3 Add or delete words as desired: • Type a word in the User word edit box and click Add to add it. • Select a word in the list box and click Delete to delete it. Click Delete All to remove all words from the dictionary.
Saving Settings Files 3 Click Save Settings... to open the Save Settings dialog box. 4 Select a folder location for the settings file. 5 Type in a file name for the settings file and click OK. All the current settings in the Options dialog box are saved into a settings file with an .ini extension. 6 Click OK to close the Options dialog box. To load a settings file: Customizing OCR 1 Choose Options... in the Tools menu to open the Options dialog box. 2 Click Load Settings...
Scheduling OCR Scheduling OCR You can schedule OCR to take place on one or more OmniPage documents, supported image files, and pages in your scanner. This processing can take place while you are away from your computer as long as OmniPage Pro is still running. Scheduled documents are opened at the specified time, unfinished pages are recognized, and the documents are saved in a preselected format and location. Scheduled documents are deleted from the processing queue if you close OmniPage Pro.
Scheduling OCR 2 Click Add... to open the Add Jobs dialog box. Click Advanced to select documents from more than one folder. 3 Locate and select the files you want to add to the schedule. You can select OmniPage Documents and supported image files. 4 Click Open after selecting the desired files. The Schedule OCR dialog box displays the newly added files. 5 Select the time that you want OmniPage Pro to process the scheduled documents.
Scheduling OCR To schedule documents from an input folder: 1 Choose Schedule OCR... in the Process menu. The Schedule OCR dialog box appears. All scheduled documents are displayed in this processing queue. Click this to modify default output options. OmniPage Pro starts processing documents in the queue at the specified time. 2 Select this to schedule documents in your scanner’s automaticdocument feeder (ADF). Click the Options... button to open the Schedule OCR Options dialog box.
Scheduling OCR If you use the auto-add feature to schedule documents and you do not select Delete original file after OCR, original files will be moved from the input folder to the output folder after processing. 4 Click OK in the Schedule OCR Options dialog box to accept the selected settings. The Schedule OCR dialog box reappears and adds documents from the input folder to the processing queue. 5 Select the time that you want OmniPage Pro to process scheduled documents.
Scheduling OCR To modify the output options for an individual document: 1 Choose Schedule OCR... in the Process menu. The Schedule OCR dialog box appears. Select the document for which you want to modify output options. Click this to modify the output options for the selected document. Click this to modify default output options. 2 Select a scheduled file and click Modify… to open the Modify Scheduled Job dialog box. 3 Select the desired options for the document.
Chapter 6 Technical Information This chapter provides troubleshooting and other technical information about using OmniPage Pro. Please also read the online Readme file and the Scanner Setup Notes. The Scanner Setup Notes list all supported scanners and any connection or software-driver issues. The Readme file contains last-minute information relating to OmniPage Pro.
General Troubleshooting Solutions General Troubleshooting Solutions Although OmniPage Pro is designed to be easy to use, problems sometimes occur. Many of the onscreen error messages contain selfexplanatory descriptions of what to do — check connections, close other applications to free up memory, and so on. Sometimes that is all the troubleshooting help you need. Please see your Windows documentation for information on optimizing your system and application performance.
General Troubleshooting Solutions Testing OmniPage Pro Restarting Windows 95 or 98 in safe mode or Windows NT in VGA mode allows you to test OmniPage Pro on a simplified system. This is recommended when you cannot resolve crashing problems or if OmniPage Pro has stopped running altogether. See Windows online help for more information. Your scanner will not run with OmniPage Pro in safe mode or VGA mode, so do not test scanner problems in this configuration.
General Troubleshooting Solutions Low Memory Problems OmniPage Pro may run poorly under low-memory conditions. This may be indicated by various error messages or if OmniPage Pro works slowly and accesses the hard drive often. Try these solutions for low memory conditions: • Restart your computer. • Close other open applications to release memory. • Close unnecessary OmniPage Pro windows. • Defragment your hard disk to free up contiguous blocks of disk space. See Windows online help for instructions.
Supported File-Format Types Supported File-Format Types OmniPage Pro can open these file-format types: BMP, Bitmap (*.bmp) OmniPage Document (*.met)† DCX (*.dcx) PCX (*.pcx) JPEG (*.jpg) TIFF uncompressed (*.tif)‡ TIFF Packbits (*.tif) TIFF Group 3 or 4, compressed (*.tif)‡ †Caere Documents from version 8.0 can be opened if the original images were preserved as .tif or .jpg files. ‡ TIFF files can be single- or multiple-page; line art, grayscale, or color.
Supported File-Format Types OmniPage Pro can save recognized text to these file formats: dBase III, III+, IV, 5.5 (*.dbf) Microsoft PowerPoint (*.rtf) Text only with line breaks (*.txt) Excel 3.0, 4.0, 5.0, 6.0, 7.0, 97 (*.xls) Microsoft Publisher 98 (*.rtf) Ventura Publisher (MS Word) (*.doc) FrameMaker 5.5.3 (*mif) OmniPage Document (*.met) Word for Windows 2.0, 6.0, and 7.0 (*.doc) Freelance Graphics (*txt) PageMaker 6.5.2 (MS Word) (*.doc) Microsoft Word 95 and Word 97 (*.
Scanner Setup Issues Scanner Setup Issues This section contains information on setting up your scanner and solutions for scanning problems you may encounter. For more detailed scanner information, read the Scanner Setup Notes by clicking Start in the Windows taskbar and choose Programs>Caere Applications>Caere Documents >Scanner Setup Notes.
Scanner Setup Issues Scanner Drivers Supplied by Caere OmniPage Pro is shipped with special scanner drivers that allow it to communicate with supported scanners. These scanner driver files are installed on your computer when you install the Caere Scan Manager. These drivers often work in conjunction with the drivers from your scanner manufacturer. To use your scanner with OmniPage Pro, you must select the appropriate scanner in the Caere Scan Manager.
Scanner Setup Issues Problems Connecting OmniPage Pro to Your Scanner Try these solutions if you experience a problem between OmniPage Pro and your scanner or if you receive a scanner error message when you launch OmniPage Pro. • Make sure the scanner is supported by OmniPage Pro with your version of Windows 95 or 98, or Windows NT. A list of tested scanners is provided in the Scanner Setup Notes.
Scanner Setup Issues Missing Scan Image Command The Scan Image command does not appear in the Image button’s dropdown list in the following cases: • You did not install the Caere Scan Manager or select an appropriate scanner. See “Scan Manager is Needed with OmniPage Pro” on page 92 for instructions. • Your scanner is not connected to your computer or is not functioning properly. See “Scanner Setup Issues” on page 91.
Scanner Setup Issues Scanner Not Listed in Supported Scanners List Box Try these solutions if your scanner is not listed in the Scan Manager Scanner list: • Check Caere Corporation’s web site at www.caere.com for Scan Manager updates. • Select TWAIN Scanner as your current scanner in the Scanner list. Scanning Tips OCR results will be poor if an image is not scanned properly. Remember the following tips when you scan: • Scan documents at 300dpi.
OCR Problems OCR Problems This section contains information and solutions for possible OCR problems. Topics in this section include: • System Crash During OCR • Text Does Not Get Recognized Properly • Problems With Fax Recognition System Crash During OCR Try these solutions if a crash occurs during OCR or if processing takes a very long time: • Resolve low memory problems. See “Low Memory Problems” on page 88 for more information. • Resolve low disk space problems.
OCR Problems Text Does Not Get Recognized Properly Try these solutions if any part of the original document is not converted to text properly during OCR: • Look at the original page image and make sure that all text areas are enclosed by text zones. If an area is not enclosed by a zone, it is ignored during OCR. See “Creating Zones for OCR” on page 22 for more information. • Make sure text zones are identified correctly.
OCR Problems Problems With Fax Recognition Try these solutions to improve OCR accuracy on fax images: • Ask senders to select Fine or Best mode when they send you a fax. This produces a resolution of 200x200 dpi. • Ask senders to transmit files directly to your computer via fax modem if you both have one. You can save fax images as image files and then load them into OmniPage Pro. See “Supported FileFormat Types” on page 89 for more information. • Ask senders to use clean, original documents if possible.
Uninstalling the Software Uninstalling the Software Sometimes uninstalling and then reinstalling OmniPage Pro and the Caere Scan Manager will solve a problem. OmniPage Pro’s Uninstall program will not remove any files saved to the OmniPage installation folder or subdirectories, including the following files: • Zone templates (*.zon) • Training files (*.trn) • User dictionaries (*.ud) • Temp files (*.tmp) To uninstall from Windows NT, you must be logged into your computer with administrator privileges.
100 Chapter 6
Index Numerics location of 4, 10 OCR button 43 overview 40 setting process commands in 40 Zone button 42 3D OCR grayscale with 58 using for poor-quality documents 55 A Accuracy settings 46, 55, 58 Accuracy statistics see OmniPage Pro’s online help Acquiring images 20 Add to Zones button 63, 66, 68 Adding pages to a document by loading image files 20 pages to a document by scanning 20 trained characters to files 76 words to your user dictionary 27, 78 ADF 20, 60 Adjusting page images before OCR 62 view o
adding images files to 20 adding scanned pages to 20 creating zones on 22 exporting 34 finishing 19 keeping graphics in 58 processing automatically 19 processing multiple languages 59 quality of original 55 types 51, 52 Double-sided pages 60 Drag and Drop see OmniPage Pro’s online help Draw Irregular Zones button 63 Draw Rectangular Zones button 63 Drawing zones automatically 22 Drawing zones manually irregular-shaped 64 rectangular 64, 65 E Editing training files 76 user dictionaries 78 Editing graphics s
L Language Analyst using for poor-quality documents 55, 97 Language settings 47 Languages installing more 59 processing more than one 59 processing one 59 Large Buttons 40 Legal documents 54 Letter documents 52 Line-art drawings 58 Load Image command 41 Loading a settings file 79 image files 20 Logos, retaining 58 Low disk space problems 88 Low memory problems 88 M Magazine pages 52 Manual brightness see Black and white setting Manual Brightness settings 55 Manually creating zones 64, 65 MAPI-compliant mai
Proofreading OCR results checking for errors in Microsoft Word 25 checking for errors in the text viewer 24 Properties for zones 72 Proportional fonts 74 Q Quality of the original document 55 R RAM requirements 88 Recognizing text 23 Recommendations for different types of documents 51, 52 for keeping original formatting 56 for processing different languages 59 for retaining graphics during OCR 58 for varying document quality 55 Red text 24 Registering applications with OCR Aware 48 Registering OmniPage Pr
on Windows NT 87 Text and tables 53 Text characters checking for errors 24 hidden from view 97 thick and run-together 55 thin and broken 55 verifying against image 25 well-formed 55 Text frames 97 removing 57 text hidden in 97 Text recognition deferring 43 performing OCR 23 problems with 96 Text viewer 10 Text, green or red 24 The basic steps of OCR 9, 18 Thick or run-together text characters 55 Thin or broken text characters 55 Thumbnail viewer 10 changing pages in 31 reordering pages in 32 Tips for scanni
106 Index