OmniPage Pro for Windows 95 CAERE CORPORATION 100 Cooper Court Los Gatos, California 95030 European Offices: Caere GmbH Innere Wiener Strasse 5 81667 Munich Germany
Please Note In order to use this program, you should know how to work in the Microsoft Windows environment. Please refer to Windows documentation if you have questions about how to use menu commands, dialog boxes, scroll bars, edit boxes, and so on. OmniPage Pro for Windows 95 Version 7.0 Copyright© 1996 Caere Corporation. All rights reserved.
Table of Contents Welcome Help Menu Commands ....................................................................................................................vi Context-Sensitive Help .....................................................................................................................vi Chapter 1 Introduction to OmniPage Pro What Is Optical Character Recognition (OCR)?.......................................................................... 1-2 What OmniPage Pro OCR Can Do ............
Creating Zones Using a Template ....................................................................................... 3-12 Deleting Zones................................................................................................................. ....... 3-13 Reidentifying Zone Types..................................................................................................... 3- 13 Performing OCR on a Document......................................................................................
Welcome Welcome to OmniPage Pro for Windows 95. This manual introduces you to the basics of using OmniPage Pro. It includes installation instructions, basic procedures, settings guidelines, and troubleshooting information. Please look in OmniPage Pro’s online help for detailed information on features and procedures. OmniPage Pro’s online help conforms to Windows 95 online help conventions and has been designed for quick and easy information retrieval.
Help Menu Commands Help Menu Commands After installing OmniPage Pro, you can open its online help by choosing commands in the Help menu. • Choose OmniPage Pro Help Topics to get contents and index listings of online help topics available for OmniPage Pro. • Choose Getting Started to get introductory topics to OmniPage Pro. • Choose OmniPage Limited Edition Help to find out how to use features in OmniPage Pro that are similar to features used in OmniPage Limited Edition.
Chapter 1 Introduction to OmniPage Pro You probably do most of your business correspondence and other projects on your computer. Sometimes, however, information that you need cannot be used immediately on a computer. For example, if you want to incorporate information from a magazine article into a document in your word processor, you somehow have to get the text from the article into your computer. Painstakingly retyping the article is not an appealing solution.
What Is Optical Character Recognition (OCR)? What Is Optical Character Recognition (OCR)? Optical character recognition (OCR) is the process of turning an image into computer-editable text. An image is an electronic picture of text such as a scanned paper document or an electronic fax file. Images do not have editable text characters; they have many tiny dots (pixels) that together form a picture of text. During OCR, OmniPage Pro analyzes an image and defines characters to produce editable text.
Chapter 2 Installation and Setup This chapter provides installation and setup information for OmniPage Pro and the Scan Manager. If you have any trouble during installation, look in Chapter 5, Technical Information, for troubleshooting solutions. After installing OmniPage Pro, please look in the online help for detailed information on features, settings, and procedures.
System Requirements System Requirements You need the following setup to install and run OmniPage Pro: • Computer with an 80386 or higher processor • Microsoft Windows 95 or Windows NT 3.51 Please see the Release Notes for the most up-to-date information on features supported by Windows NT 3.51 and later versions.
Setting Up Your Scanner With OmniPage Pro 6 Follow the onscreen instructions and insert the other installation disks as prompted. To install OmniPage Pro on Windows NT 3.51: 1 Start Windows NT and open the Program Manager. 2 Insert OmniPage Pro disk #1 in your computer’s floppy disk drive. 3 Choose Run in the Program Manager’s File menu. 4 Type a:\setup (or b:\setup) in the Command Line text box and click OK. 5 Follow the onscreen instructions and insert the other installation disks as prompted.
Setting Up Your Scanner With OmniPage Pro To change your default scanner in OmniPage Pro: 1 Exit OmniPage Pro if it is running. 2 Click Start in the Windows 95 taskbar and choose Programs Caere Applications Scan Manager 2.0. The Scanner Setup dialog box appears. 3 Click Add>> and insert the Scan Manager disk when prompted. The dialog box expands to show a list of available scanners. Some scanners are listed more than once because they have different driver options.
Starting OmniPage Pro Starting OmniPage Pro Make sure your scanner is attached to your computer, turned on, and working before you start OmniPage Pro if you plan to scan. To start OmniPage Pro: • Windows 95 users: click Start in the taskbar and choose Programs Caere Applications OmniPage Pro for Windows 95. The OmniPage Pro launcher is located in the program folder you selected during installation; Caere Applications is the default. • Windows NT 3.
The OmniPage Pro Desktop 2 Click the Call drop-down list and locate the phone number for your country. 3 Call the phone number and ask for a registration number. You will be asked to provide the serial and key numbers listed in the dialog box. 4 Enter the registration number that is given to you in the Registration Number text box and click OK. The Registration menu disappears from the menu bar after you register.
Chapter 3 Basic Procedures This chapter gives an overview of processing documents in OmniPage Pro from start to finish. It describes the basic steps of OCR and provides instructions for each step. There are different ways to accomplish the same tasks in OmniPage Pro. For example, you can use toolbar buttons or menu commands to start procedures. You can have OmniPage Pro perform all OCR steps automatically, or you can start each step individually.
The Basic Steps of OmniPage Pro OCR The Basic Steps of OmniPage Pro OCR Optical character recognition (OCR) is the process of turning an image into computer-editable text so you do not have to retype the text manually. These are the basic steps of OmniPage Pro OCR: 1 Bring a document image into OmniPage Pro. You can scan a paper document, load an image file, or load an Exchange fax. The resulting image appears in OmniPage Pro’s image viewer.
The AutoOCR Toolbar The AutoOCR Toolbar The AutoOCR toolbar buttons allow you to take a document through each step of the OCR process. Every toolbar button has different process commands that can be set for the operations you want to perform. OmniPage Pro can go through all steps automatically, or you can start each step individually.
The AutoOCR Toolbar Image Button Commands Use the Image button to bring a document image into OmniPage Pro’s image viewer. The Image button’s drop-down list contains the Scan Image, Load Image, and Load Exchange Fax commands. Scan Image Select Scan Image to scan paper documents in your scanner. This command only appears in the drop-down list if you have installed the Scan Manager and have selected your default scanner. See “Scanning Pages” on page 3-7.
The AutoOCR Toolbar Manual Zones Select Manual Zones to draw and order your own zones on document images. See “Creating Zones Manually” on page 3-11. Zone Templates Select a zone template to create zones on document images using that template. See “Creating Zones Using a Template” on page 3-12. OCR Button Commands Use the OCR button to perform the selected OCR operation on document images. The OCR button’s drop-down list contains the Perform OCR, OCR and Check, Train OCR, and Defer OCR commands.
Automatic Processing Export Button Commands Use the Export button to export recognized text and retained graphics to other applications. The Export button’s drop-down list contains the Save As, Send, Copy to Clipboard, and Defer Export commands. Save As Select Save As to save a copy of a document to disk in a specified file format. See “Saving a Document” on page 3-16. Send Select Send to attach a copy of a document to a Microsoft Exchange mail message in a specified file format.
Bringing Document Images into OmniPage Pro 4 Click AUTO or choose AutoOCR in the Process menu. • Each page of a new document is processed in order. OmniPage Pro pauses for you to create zones if you selected Manual Zones. After drawing zones, click AUTO to continue with the selected OCR and export operations. • Each page of an open document is finished in order. OmniPage Pro creates zones on any unzoned pages automatically or with a currently selected template.
Bringing Document Images into OmniPage Pro To scan pages into OmniPage Pro: 1 Place your page in your scanner. You can scan a stack of pages if you have an automatic document feeder (ADF). 2 Set Scan Image as the command in the Image button’s dropdown list. 3 Choose Options... in the Tools menu and click the Scanner tab to make sure the appropriate settings are selected. Select Scan Until Empty if you want to scan all pages in an ADF at once.
Bringing Document Images into OmniPage Pro 3 Select the folder location and file type of the file you want to load. Files of that type in the selected location appear in the list box. 4 Select the files you want to load. You can Shift-click or Ctrl-click to select multiple files in the same folder. 5 Click Advanced if you want to select files in other folders. • Select a file and click Add to put it in the Selected Files list. • Click Add All to add all files from the current folder.
Creating Zones for OCR 3 Select the folder location of the faxes you want to load. 4 Select the faxes you want to load. You can Shift-click or Ctrl-click to select multiple faxes. 5 Click Open when you have selected all the faxes you want to load. Exchange faxes are loaded in the order selected and combined into one working document. If a document is already open, loaded faxes are inserted as new pages.
Creating Zones for OCR AccuPage as the auto zoning command when you scan pages with an HP AccuPage scanner. 4 Click the Zone button or choose Auto Zones in the Process menu. OmniPage Pro automatically draws zones on the current page in the image viewer. Each zone has a number indicating its order and a letter indicating its zone type.
Creating Zones for OCR 4 Enclose an area of the image you want as a zone by holding the mouse button down and dragging the mouse to form a rectangular box. Try to keep areas of text, such as paragraphs, together in the same zone. Each zone displays a number indicating its order and a letter indicating its zone type. 5 Repeat steps 3–4 until you have finished drawing zones around each area that you want to recognize as text or retain as a graphic.
Creating Zones for OCR Deleting Zones You can delete the current zones if you want to create new zones or if you do not want to process a particular area of a page during OCR. To delete zones: 1 Select the zone you want to delete by clicking inside it in the image viewer. You can Shift-click to select additional zones. Choose Select All in the Edit menu to select all zones on the current page. Selected zones are shaded. 2 Press the Delete key or choose Clear in the Edit menu.
Performing OCR on a Document Performing OCR on a Document Performing OCR converts an image to editable text. This is also referred to as recognizing text. OmniPage Pro recognizes printed text characters only, but it can retain handwritten text, such as a signature, as a graphic element. You can also initiate OCR by using OCR Aware, OLE drag and drop, shortcut menus associated with image files, and other features. See OmniPage Pro’s online help for details. To perform OCR: 1 Choose Options...
Performing OCR on a Document • Unrecognizable characters marked by a red reject character (~ is the default) • Words not found in the main or user dictionary To check and correct errors: 1 Click the Check Recognition button in the Standard toolbar or choose Check Recognition... in the Tools menu. The Check Recognition dialog box displays a possible error and its original image. Click in this window to enlarge or reduce the image view.
Exporting Documents Verifying Text You can compare recognized text against the original image to make sure that text was recognized correctly. To verify text against its original image: 1 Double-click any word in the text viewer or select a word and choose Verify Text in the Tools menu. The Verify Text window opens and shows a clear close-up of the original word and its surrounding area in the image. Close button 2 Click the standard Close button in the Verify Text window to close it.
Exporting Documents To save recognized text: 1 Choose Save As... in the File menu. (You can also click the Export button with Save As selected in the drop-down list. The Save As dialog box appears. 2 Select a folder location and file type for your document. 3 Type in a file name and select a save option. 4 Click Save. The document is saved to disk as specified. Retained graphics are saved with the file only if the selected file format supports them.
Exporting Documents 2 Select a folder location and file type for your document. 3 Type in a file name and select the desired save and image options. 4 Click Save. The image is saved to disk as specified (zone borders and recognized text are not saved). A copy of the document remains open in OmniPage Pro. Copying a Document to the Clipboard You can copy every page of a document’s recognized text to the Clipboard and then paste the text directly into another application.
Chapter 4 Settings Guide This chapter introduces you to settings in the Options dialog box and provides guidelines for selecting settings. The settings you select for processing documents can greatly affect the OCR results. You may have to experiment with different settings to get the results you want. Please look in OmniPage Pro’s online help for detailed information on settings and procedures.
Selecting OmniPage Pro Settings Selecting OmniPage Pro Settings Click the Options button or choose Options... in the Tools menu to open the Options dialog box. This is the central location for OmniPage Pro settings. Click each tab in the Options dialog box to display different groups of settings: • Click the Accuracy tab to select settings that affect OCR accuracy the most. • Click the Page Format tab to select settings that determine how the formatting of a page is handled during OCR.
What Is the Quality of the Original Document? What Is the Quality of the Original Document? Poor or Not Sure Degraded copies, colored or shaded backgrounds, runtogether or broken text characters Recommendations for Scanning • Select 3D OCR in the Accuracy settings if you have a grayscale scanner. • Experiment with the Manual Brightness setting to get a good scan if you have a black-and-white scanner. Lighten the setting for thick, run-together text characters and/or dark backgrounds.
What Type of Document Are You Processing? What Type of Document Are You Processing? Magazine and Newspaper Pages Recommendations • Select Multiple Columns in the Page Format settings. • Select the appropriate page size and orientation in the Scanner settings if you are scanning. • Draw zones manually if auto zoning does not successfully create zones around all page areas you want to process. Keep associated sections of text, such as paragraphs, together in one zone.
What Type of Document Are You Processing? Legal Documents Recommendations • Select Multiple Columns in the Page Format settings if text appears in two or more columns. • Select Single Column if the document has one, page-wide text column. • Select the appropriate page size and orientation in the Scanner settings if you are scanning. • Draw zones manually to omit unnecessary parts of the page. For example, do not include line numbers in a zone if you plan to renumber lines in your word processor.
How Much Original Formatting Do You Want to Keep? How Much Original Formatting Do You Want to Keep? None Keep plain text only Recommendations • Select Remove Formatting in the Page Format settings. • Click Font Mapping in the Page Format settings and select one font and one font size to be used for all text. Some Keep font characteristics and some paragraph formatting Recommendations • Select Retain Font and Paragraph Style in the Page Format settings.
Do You Want to Retain Graphics in Your Document? Do You Want to Retain Graphics in Your Document? Yes Keep graphics such as logos and photos during OCR processing Recommendations • Select 3D OCR or Auto Brightness in the Accuracy settings if you are scanning with a grayscale scanner and you want grayscale graphics. • Select 3D OCR or Auto Brightness in the Accuracy settings if you are loading a grayscale image file and want to retain grayscale graphics.
How Many Languages Are in Your Document? How Many Languages Are in Your Document? More Than One Language Recommendations for Faster Processing Use this method if you have a dictionary for only one language. 1 Create zones around all areas that you want to recognize. 2 Select the appropriate language character sets in the Language settings for all languages in the document. 3 Select the main and user dictionaries in the Language settings for the language that appears the most frequently.
How Many Languages Are in Your Document? One Language Recommendations • Select the appropriate language character set in the Language settings. • Select the appropriate main and user dictionaries in the Language settings. • Deselect Use Language Analyst and 3D OCR in the Accuracy settings if you do not have a main dictionary that matches the language in your document. Both of these features use dictionary information that will conflict with non-matching languages.
Are You Processing a Multiple-Page Document? Are You Processing a Multiple-Page Document? Yes Recommendations if You Have an Automatic Document Feeder (ADF) • Select Scan Until Empty in the Scanner settings to scan a stack of pages at once. Otherwise, you must click the Image button to scan each subsequent page. • Select Double-Sided Pages to scan pages with print on both sides. You will be prompted to turn the stack over. • Insert blank pages to separate more than one job within a stack of pages.
Chapter 5 Technical Information This chapter provides troubleshooting and other technical information about using OmniPage Pro. For detailed information on OmniPage Pro settings and procedures, please look in OmniPage Pro’s online help.
Troubleshooting Solutions Troubleshooting Solutions Topics in this section include: • General Troubleshooting Solutions • Setup Program Requests the Same Disk • Uninstalling the Software • Testing OmniPage Pro in Safe Mode • Low Memory Problems • Low Disk Space Problems • Scanning Problems • OCR Problems Please see your Windows documentation for information on optimizing your system and application performance.
Troubleshooting Solutions Setup Program Requests the Same Disk Try this solution if the Setup program repeatedly requests the same installation disk: 1 Exit the Setup program and eject the disk. 2 Restart your computer. 3 Run ScanDisk to check the installation disk for errors. See Windows 95 online help for more information on ScanDisk. If any errors are detected, contact Product Support for a replacement disk. See “Product Support” on page 5-10. 4 Try installing again if ScanDisk detects no errors.
Troubleshooting Solutions Testing OmniPage Pro in Safe Mode Restarting Windows 95 in safe mode allows you to test OmniPage Pro on a simplified system. This is recommended when you cannot resolve crashing problems or if OmniPage Pro has stopped running altogether. See Windows online help for more information on safe mode. Your scanner will not run with OmniPage Pro in safe mode, so do not test scanner problems in this configuration.
Troubleshooting Solutions Low Disk Space Problems Problems may occur if your system runs low on free disk space. Try these solutions for low disk space problems: • Empty the Windows Recycle Bin. • Delete the *.tmp files in the Temp folder. This folder is usually in your Win 95 folder. • Run ScanDisk to check for errors that may be using up disk space. See Windows online help for instructions. • Back up unneeded files onto floppy disks or other media and delete them from your hard disk.
Troubleshooting Solutions Missing Scan Image Command The Scan Image command does not appear in the Image button’s dropdown list in the following cases: • You did not install the Scan Manager or select an appropriate default scanner in it. See “Setting Up Your Scanner With OmniPage Pro” on page 2-3 for instructions. • Your scanner is not connected to your computer or is not functioning properly. See “Scanning Problems” on page 5-5.
Troubleshooting Solutions • Restart Windows in safe mode and test OmniPage Pro by performing OCR on Sample.tif. See “Testing OmniPage Pro in Safe Mode” on page 5-4. • Uninstall and reinstall OmniPage Pro. Text Does Not Get Recognized Properly Try these solutions if any part of the original document is not converted to text properly during OCR: • Look at the original page image and make sure that all text areas are enclosed by text zones. If an area is not enclosed by a zone, it is ignored during OCR.
Scanner Setup Issues Scanner Setup Issues Topics in this section include: • Scanner Drivers Supplied by the Manufacturer • Scanner Drivers Supplied by Caere • Scanner Message on Launch • Using Visioneer PaperPort with OmniPage Pro • Using TWAIN-Compliant Scanners See also the Scanner Setup Notes included in the OmniPage Pro package. Scanner Drivers Supplied by the Manufacturer Many scanners are shipped with one or more scanner drivers.
Scanner Setup Issues Using Visioneer PaperPort with OmniPage Pro OmniPage Pro automatically integrates with Visioneer’s PaperPort software. However, you cannot scan directly into OmniPage Pro if you use a Visioneer scanner or if your scanner is set up to work with PaperPort software (such as the HP ScanJet 4s). Instead, scan pages into PaperPort and then drag the page images onto the OmniPage Pro icon at the bottom of the PaperPort Desktop. The page images will be loaded into OmniPage Pro.
Product Support Product Support Please look first in OmniPage Pro’s online help or in this manual if you need help — you may be able to save yourself a phone call. Product Support Services Product support and information are available for registered users through these services. Service Provided Service How to Contact World Wide Web home page http://www.caere.
Index Numerics 3D OCR using for poor-quality documents 4-3 when to deselect 4-8, 4-9 A AccuPage 3-4, 3-10 Accuracy settings 4-2 see also OmniPage Pro’s online help Accuracy statistics see OmniPage Pro’s online help Acquiring images 3-7 Adding pages to a document loading Exchange faxes 3-9 loading image files 3-8 scanning pages 3-8 ADF settings 4-10 Adjusting page views see OmniPage Pro’s online help Alphanumeric zone types 3-13 Applications and formatting 4-6 AUTO button automatic processing 3-6 described
processing multiple languages in 4-8 processing multiple pages in 4-10 quality of original 4-3 types 4-4 Double-sided pages 4-10 Drag and Drop 3-14 see also OmniPage Pro’s online help Draw Zones tool 3-11 Drawing zones 3-11 E Editing graphics see OmniPage Pro’s online help Editing training files see OmniPage Pro’s online help Editing zone types see OmniPage Pro’s online help Enlarging page views see OmniPage Pro’s online help Errors checking for 3-14 possible reasons 5-7 Exchange loading faxes from 3-9 send
Keeping graphics during OCR 4-7 L Language Analyst see also OmniPage Pro’s online help using for poor-quality documents 4-3, 5-7 when to deselect 4-8, 4-9 Language settings 4-2 see also OmniPage Pro’s online help Languages processing more than one 4-8 processing one 4-8 Legal documents 4-5 Letter documents 4-4 Letter markers in zones 3-13 Line-art drawings 4-7 Load Exchange Fax command 3-4 Load Image command 3-4 Loading Exchange faxes 3-9 image files 3-8 Logos, retaining 4-7 Low disk space problems 5-5 Low
setting 3-3 Zone 3-4 Process settings 3-3, 4-2 see also OmniPage Pro’s online help Product support see also Troubleshooting information to provide 5-10 international support 5-10 support services 5-10 Q Quality of the original document 4-3 R RAM requirements 2-2, 5-4 Recognizing text 3-14 Recommendations for different types of documents 4-4 for keeping original formatting 4-6 for processing different languages 4-8 for processing multiple-page documents 4-10 for retaining graphics during OCR 4-7 for varying
saving 3-12 using 3-12 Testing in safe mode 5-4 Text characters checking for errors 3-15 hidden from view 5-7 thick and run-together 4-3 thin and broken 4-3 verifying against image 3-16 well-formed 4-3 Text frames 5-7 removing 4-6 text hidden in 5-7 Text recognition creating zones for 3-10 deferring 3-5 performing OCR 3-14 problems with 5-6 Text viewer 2-6 Text zones 3-13 The basic steps of OCR 3-2 Thick or run-together text characters 4-3 Thin or broken text characters 4-3 Thumbnail viewer 2-6 Toolbar view
Index-6