Untitled Document Pro OCR User’s Guide file:///C|/VisioneerDoc/Main.
Pro OCR User’s Guide Contents Chapter 1: Introducing Visioneer Pro OCR 100 Pro OCR User’s Guide Chapter 1 Chapter 2: Learning Pro OCR Basics Chapter 3: Getting Documents Chapter 4: Locating Text and Graphics Chapter 5: Setting Recognize Options and Proofing a Introducing Visioneer Pro OCR 100 This chapter introduces you to the Pro OCR application and to the concept of optical character recognition (OCR). Why Pro OCR Pro OCR is an Optical Character Recognition (OCR) application.
Pro OCR User’s Guide Recognized Document Chapter 6: Saving and Printing Documents Chapter 7: Creating and Processing Deferred and Batch Jobs Chapter 8: Tips for Getting the Best Results character recognition) application, such as Pro OCR. Every day you may spend a lot of time retyping printed text or numbers from hard copy documents. By using Pro OCR and a scanner as an input device, you can eliminate much of this retyping.
Pro OCR User’s Guide letters. ■ ■ ■ ■ ■ Recognition and retention of fonts, characters, styles, and page formatting. Pro OCR recognizes and retains the differences between serif and sans-serif fonts, styles such as bold, underline, and subscript, and formatting such as columns, tables, and indents. Deferred and batch processing.
Introducing Visioneer Pro OCR 100 Pro OCR User’s Guide Chapter 1 Introducing Visioneer Pro OCR 100 This chapter introduces you to the Pro OCR application and to the concept of optical character recognition (OCR). Why Pro OCR Pro OCR is an Optical Character Recognition (OCR) application. An OCR application converts images of text, such as those obtained from scanning a document or receiving a fax through your fax-modem, into editable text.
Introducing Visioneer Pro OCR 100 Most basic OCR applications inspect the scanned page image, attempt to recognize the dots on the page as characters, and transform the image into a plain text file. Pro OCR does all of these basic tasks, but it can also get the entire page into your word processor or spreadsheet as is—retaining the shape, form, type, and spacing, as well as the content, of the input page.
file:///C|/VisioneerDoc/html/copyrt.htm Copyright Information Pro OCR User’s Guide for Windows. Copyright ©1998 Visioneer, Inc. All rights reserved. Reproduction, adaptation, or translation without prior written permission is prohibited, except as allowed under the copyright laws.
file:///C|/VisioneerDoc/html/copyrt.htm Visioneer’s Limited Product Warranty If you find physical defects in the materials or the workmanship used in making the product described in this document, Visioneer will repair, or at its option, replace, the product at no charge to you, provided you return it (postage prepaid, with proof of your purchase from the original reseller) during the 12-month period after the date of your original purchase of the product.
file:///C|/VisioneerDoc/html/copyrt.htm ■ ■ ■ Increase the separation between the equipment and receiver. Connect the equipment into an outlet on a circuit different from that to which the receiver is connected. Consult the dealer or an experienced radio/TV technician for help. This equipment has been certified to comply with the limits for a class B computing device, pursuant to FCC Rules. In order to maintain compliance with FCC regulations, shielded cables must be used with this equipment.
Table of Contents Contents Chapter 1: Introducing Visioneer Pro OCR 100 Chapter 2: Learning Pro OCR Basics Chapter 3: Getting Documents Chapter 4: Locating Text and Graphics Chapter 5: Setting Recognize Options and Proofing a Recognized Document Chapter 6: Saving and Printing Documents Chapter 7: Creating and Processing Deferred and Batch Jobs Chapter 8: Tips for Getting the Best Results Glossary file:///C|/VisioneerDoc/html/toc.
Table of Contents Contents Chapter 1: Introducing Visioneer Pro OCR 100 Why Pro OCR Features and Highlights of Pro OCR Glossary file:///C|/VisioneerDoc/html/toc1.
Glossary Glossary A4 Letter page size accelerator key ADF alphanumeric word ASCII As Single Column locating method Auto OCR Auto brightness automatic document feeder (ADF) automatic processing background noise backup backwards compatible bit image bitmap bitmapped character bold text brightness broken character file:///C|/VisioneerDoc/html/glos.
Glossary built-in dictionary CCITT character character format character identification error character image character recognition character style clipboard column information compression confidence consistent document copyrighted document deferred job deferred processing degraded image dialog box desktop document area dots per inch (dpi) file:///C|/VisioneerDoc/html/glos.
Glossary dpi draft quality text driver exporting export format file extension file formats file type fine resolution flatbed scanner font font family font mapping format retention Gallery Get Page grayscale image hard page breaks heavy character I-beam pointer file:///C|/VisioneerDoc/html/glos.
Glossary icon illegible character illegible character symbol image view input file formats insertion point italic text justification kerning landscape orientation layout layout analysis error Legal page size Lenient suspect threshold letter quality text line break Locate locate region locating locating method menu file:///C|/VisioneerDoc/html/glos.
Glossary menu bar multi-column text monospaced font monospaced font mapping newspaper style columns Normal locating method Normal suspect threshold numeric region OCR On-Screen Verifier™ Optical Character Recognition (OCR) order of text regions orientation output file formats page controls page format page image page number box page orientation page size file:///C|/VisioneerDoc/html/glos.
Glossary page source PCX picture element picture region pixel pixel-for-pixel plain text portrait orientation printer font Pro OCR Deferred format Pro OCR format Pro OCR process Pro OCR window Proof proportionally spaced font recognition accuracy Recognize recognized text recognizing region style resolution file:///C|/VisioneerDoc/html/glos.
Glossary Rich Text Format (RTF) RTF sans serif sans serif font mapping scanner scanner driver scanning screen font scroll bars serif serif font mapping settings file sheetfed scanner side-by-side columns single-bit image single-step processing skewed text spell checking standard resolution status bar file:///C|/VisioneerDoc/html/glos.
Glossary status display area Stringent suspect threshold stroke weight Style ribbon stylized font subscript text superscript text supplementary dictionaries suspect character suspect threshold Tag Image File Format template template matching Template locating method text quality text region text style text view throughput TIFF touching characters file:///C|/VisioneerDoc/html/glos.
Glossary typeface type quality type size type style underline text User Defined page size user dictionary view selector window Windows word wrap zoom controls file:///C|/VisioneerDoc/html/glos.
file:///C|/VisioneerDoc/html/glossary.htm Glossary A4 Letter page size An A4 size page measures 8.33" x 11.66". accelerator key In Windows applications, a keyboard shortcut to a menu command. ADF See automatic document feeder (ADF). alphanumeric word A word made up of the alphabetic and numeric characters (A–Z, a–z, 0–9) in a character set. Excludes punctuation and other symbol characters. ASCII Acronym for American Standard Code for Information Interchange (pronounced “ASK-ee”).
file:///C|/VisioneerDoc/html/glossary.htm automatic processing A method for using Pro OCR with minimal intervention. Automatic processing involves setting appropriate Gallery settings, before using Auto Start to read in one or more image files or scan in one or more pages. Once page images have been acquired, automatic processing Locates and Recognizes each page image in succession.
file:///C|/VisioneerDoc/html/glossary.htm bitmapped character A character image made up of a pattern of dots that exists in a computer file or in memory as a bitmap. Bitmapped characters cannot be interpreted by a computer. In order for a computer to use bitmapped characters in a word processor or spreadsheet, the characters must first be interpreted by an OCR application and translated into ASCII text. bold text Text with the bold attribute looks like this. See also text style.
file:///C|/VisioneerDoc/html/glossary.htm character format Font and style information applied to characters. Character format information includes the font name and type size, as attributes such as underline, bold, italic, or some combination of these properties. Compare with page format. character identification error An incorrectly recognized bitmapped character. There are two kinds of character identification errors—substitutions and rejects.
file:///C|/VisioneerDoc/html/glossary.htm consistent document A set of pages or image files where the same Gallery settings apply to each page in the document. Pro OCR’s Auto Start feature can be used to best effect when a document is consistent. copyrighted document Most published or printed materials and documents are copyrighted.
file:///C|/VisioneerDoc/html/glossary.htm dots per inch (dpi) A measure of the visual resolution of a display or output device. Monitor screens typically have resolutions in the range of 70 to 75 dpi. Most common laser printers have a resolution of 300 dpi. The lower the resolution of a page in dots per inch, the lower the visual quality of characters on that page. Pro OCR can quickly and accurately recognize characters scanned in at resolutions down to 200 dpi. dpi See dots per inch (dpi).
file:///C|/VisioneerDoc/html/glossary.htm fine resolution A term associated with FAX modems, referring to the highest resolution of the image files typically produced by these devices. Fine resolution is approximately 200 x 200 dpi, which is adequate for reliable recognition. flatbed scanner Scanner with a glass plate on which pages are placed face down.
file:///C|/VisioneerDoc/html/glossary.htm grayscale image An image format where individual pixels can be expressed with more than a single bit, allowing the image to contain true shades of gray. Pro OCR will not open grayscale images. Compare with single-bit image. hard page breaks Special formatting that you put in manually in a text or word processor document. Most word processors and text editors automatically create soft page breaks unless you explicitly specify hard page breaks.
file:///C|/VisioneerDoc/html/glossary.htm input file formats Pro OCR can read documents saved by other applications in TIFF, PCX and DCX formats, as well as those documents saved in its own proprietary TIFF format. See also PCX and TIFF. insertion point The place in a text file where text is inserted or deleted. Indicated by a blinking vertical bar. italic text Text with the italic attribute looks like this. See also text style.
file:///C|/VisioneerDoc/html/glossary.htm Lenient suspect threshold Tells Pro OCR to only highlight suspect characters it is very uncertain of. Very few characters are marked as suspect, compared to when the suspect threshold is set to normal or stringent. Use it when you’re dealing with documents containing fonts that you know from experience have been recognized accurately or when you’re less concerned with double-checking. Set in the Display Options dialog box.
file:///C|/VisioneerDoc/html/glossary.htm menu A list of choices from which the user can choose. Menus appear when you point to and click a menu title in the menu bar, or a pop-up menu title in a window or dialog box. menu bar The horizontal strip at the top of a window that contains menu titles. multi-column text Text that is formatted into more than one column on a single page. Examples include phone books and newspapers. monospaced font Also known as a fixed pitch font.
file:///C|/VisioneerDoc/html/glossary.htm numeric region Defines a numeric area on the page image in Image View and Text View. Numeric regions may be defined using Pro OCR’s manual region drawing feature, or may be recalled using the Template locating method. Compare with text region and picture region. See also Template locating method. OCR See Optical Character Recognition (OCR).
file:///C|/VisioneerDoc/html/glossary.htm page image The bitmapped image of a scanned page, displayed in the image view in Pro OCR. page number box Shows which page is being viewed and how many pages are in the document. Double-click it to go to a specific page. See also page controls. page orientation See orientation. page size The width and height to use when getting a page from a scanner within Pro OCR. There are three pre-defined page sizes: US Letter, US Legal, and A4 Letter.
file:///C|/VisioneerDoc/html/glossary.htm portrait orientation When you hold a page of text to read it, it is in portrait orientation when the page is taller than it is wide. Compare with landscape orientation. printer font The representation of a font or typeface used for printing by a printer. See also font, font family, and typeface. Compare with screen font. Pro OCR Deferred format One of Pro OCR’s output file formats.
file:///C|/VisioneerDoc/html/glossary.htm proportionally spaced font Also known as a variable pitch font. Typeface in which each character takes up an amount of horizontal space consistent with its relative physical width, i.e. an “i” needs less space than a “w.” Times Roman and Helvetica are two common proportionally spaced typefaces. Compare with monospaced font. recognition accuracy A measure of the degree to which OCR output conforms to the individual characters in the input document.
file:///C|/VisioneerDoc/html/glossary.htm RTF See Rich Text Format (RTF). sans serif Designation for font families in which the characters do not have serifs, which are the small strokes at the ends of characters. Common sans serif font families include Helvetica, Avant Garde, and Univers. Compare with serif. sans serif font mapping The font chosen for displaying sans serif text characters in text views. Set in the Display Options dialog box.
file:///C|/VisioneerDoc/html/glossary.htm settings file A file, saved by choosing Save Settings from the File menu, that saves the current gallery, processing preferences, display preferences, proofing preferences, and selected scanner information in a named settings file. To use a settings file, retrieve it by choosing Retrieve Settings from the File menu. sheetfed scanner Scanner with an integral sheetfeeder, but no flatbed, on which pages are placed and fed through the scanner.
file:///C|/VisioneerDoc/html/glossary.htm spell checking Pro OCR automatically checks spelling during the Recognize step using its built-in dictionary and the current user dictionary. After Pro OCR finishes recognizing, you can check the spelling in a document using the user-configured Proof command. standard resolution A term associated with FAX modems, referring to the default resolution of the image files produced by these devices.
file:///C|/VisioneerDoc/html/glossary.htm subscript text Text with the subscript attribute is below the baseline like this. superscript text Text with the superscript attribute is above the baseline like this. supplementary dictionaries Optional dictionaries that can be used during spell checking in Pro OCR. There are four supplementary dictionaries included with Pro OCR: geographical, legal, medical, and an expanded dictionary. Compare with built-in dictionary and user dictionary.
file:///C|/VisioneerDoc/html/glossary.htm text region Defines a text area on the page image in the image view and the text view. Only text within defined text regions is recognized. Text regions may be defined manually or by using Pro OCR’s automatic locating settings. text style A piece of text’s attributes or styling, such as bold, italic, or underline. Use the Style menu or the style ribbon to set these attributes. See also bold text, italic text, underline text, and Style ribbon.
file:///C|/VisioneerDoc/html/glossary.htm type size The vertical height measurement of type, commonly expressed in points (72 points=1 inch). Pro OCR recognizes and preserves type ranging in size from 5 points to 64 points. type style The variations in characters, including font characteristics such as bold and italic, and styling characteristics such as underlining. Pro OCR recognizes and preserves many type style characteristics. underline text Text with the underline attribute looks like this.
file:///C|/VisioneerDoc/html/glossary.htm word wrap The automatic continuation of text from the end of one line to the beginning of the next. Word wrap lets you avoid pressing the Return key at the end of each line as you type. For example, when you input text in most word processors, lines of type are automatically “wrapped” to the next line when they won’t fit within the current line margins.
Table of Contents Contents Chapter 2: Learning Pro OCR Basics The Basic Steps Starting Pro OCR Selecting a TWAIN-Compliant Scanner Learning About the Gallery Toolbar Tutorial Examples Example 1: Using Auto OCR to Scan a One-Page Simple Document and Save It in Pro OCR Format Example 2: Opening a File and Saving It in a Word Processor Format Example 3: Scanning a Document of Multi-Column Text Example 4: Scanning a Document With Tables and Saving in a Spreadsheet Format Example 5: Scanning and Saving a Docume
Learning Pro OCR Basics Chapter 2 Learning Pro OCR Basics This chapter gets you started with Pro OCR. It introduces you to the Pro OCR window features, tells you the basic steps that you use when you work with Pro OCR, and provides several tutorial examples that you can complete to practice with Pro OCR. TIP: If you use PaperPort software or scanners, see the Working with PaperPort document that came with Pro OCR. It provides tips and other information about using Pro OCR with these Visioneer products.
Learning Pro OCR Basics Starting Pro OCR The following procedure helps you to get acquainted with Pro OCR and make sure that everything is set up correctly. TIP: In addition to the following procedure, Visioneer provides two other ways to start and use Pro OCR: 1) From the Windows Start menu, choose Programs, and then choose Visioneer OCR Wizard. 2) If you use PaperPort software, start PaperPort and then choose the Pro OCR link. To start Pro OCR and select processing options: 1.
Learning Pro OCR Basics Feature Does this... Pull-down menus Contains commands and options that you use to set process options and initiate actions. Many of the commands in the pull-down menus are also available by using the Gallery buttons and Gallery buttons drop-down lists. Gallery toolbar Lets you change common settings, start Auto OCR, or individually perform any of the basic steps required to convert an image to text.
Learning Pro OCR Basics Status bar Contains controls with which you choose how to view pages (text or image view) and which pages to view. The Status bar also contains a status display area to keep you informed of Pro OCR’s progress. Zoom controls Magnifies or reduces the view of the document. View controls Displays the page in a landscape or portrait orientation. Page controls Displays the previous or next page.
Learning Pro OCR Basics Figure 2-1: Select Source Dialog Box NOTE: If the scanner driver you want is not shown, make sure that the scanner is properly connected to the computer and that both the scanner and the computer are plugged in, turned on, and operating correctly. 2. In the Select Source dialog box, select the TWAIN scanner driver you want to use with Pro OCR. 3. Click Select. The scanner you selected is available until you select a different one.
Learning Pro OCR Basics NOTE: Often you will use Auto OCR to complete processing. However, sometimes it is better to perform each step individually. (This is also referred to as manual or singlestep operation.) For example, you use the single-step procedures when you want to manually define locate regions, create a template, redo a step, recognize different type quality settings, or scan pages that have mixed orientations (portrait and landscape.) Button Does this...
Learning Pro OCR Basics Save As Saves the converted document in a variety of formats, such as text, Rich Text Format (RTF), or HTML. You can select options with the Gallery buttons by using the drop-down list next to each button. To select an option from a Gallery drop-down list: 1. Click the arrow next to the Gallery button you want. The drop-down list for the button appears. The following figure shows the Locate button with the drop-down list displayed. 2. Select the option you want.
Learning Pro OCR Basics Example 1: Using Auto OCR to Scan a One-Page Simple Document and Save It in Pro OCR Format This example shows how to convert (recognize) the text in a one-page document. You can find a ready-to-use sample in the back of the Getting Started Guide. Selecting Gallery Options Pro OCR processes a document using the options that are set in each drop-down list associated with a button of the Gallery toolbar. To set Gallery options for this example: 1.
Learning Pro OCR Basics 6. Click End. Pro OCR continues with the second task to locate text regions on the page. A progress bar moves down the page. When Pro OCR finishes locating, it displays text boxes indicating located text regions, with arrows connecting each text region to the next. Pro OCR outputs text in the order in which the arrows connect the text regions. file:///C|/VisioneerDoc/html/02learn.
Learning Pro OCR Basics In the next step, Pro OCR recognizes the located text. While Pro OCR is recognizing, again a progress bar moves down the page. When Pro OCR finishes recognizing the text, the Recognition Completed dialog box appears. 7. Click OK. The document appears in the text view. You use the text view to proof the document and correct any errors. file:///C|/VisioneerDoc/html/02learn.
Learning Pro OCR Basics Usually at this point you proof the document. For now, just save it. Saving a Document You can save the processed document to disk in different formats. For example, if you want to open the document again in Pro OCR, you select the Pro OCR format. To save the document: file:///C|/VisioneerDoc/html/02learn.
Learning Pro OCR Basics 1. Choose Save from the File menu, or click the Save As button on the Gallery toolbar. The Save As dialog box appears. 2. Choose Pro OCR from the Save As drop-down list. By saving the document in this format, you can edit the pages later within Pro file:///C|/VisioneerDoc/html/02learn.
Learning Pro OCR Basics OCR. If you save in another file format, you must open it in an application that supports that format. 3. Type in a name for the file in the File Name box. 4. Click Save. The text and format information of the document is saved in the format you’ve selected. 5. Choose Close from the File menu. You just completed your first job using Pro OCR. Many of the jobs for which you use Pro OCR are as quick and simple as this one.
Learning Pro OCR Basics 3. In the Pro OCR directory, select the file SAMPLEB.TIF. 4. Click Get. The sample file is read in and the progress bar moves down the page. Locating the Regions in a Document For Pro OCR to properly convert areas of a document, you must locate the regions of the page that will be recognized. There are three types of regions: text, numeric, and picture. For example, a picture region is one that contains any kind of graphic, illustration, photograph, drawing, or picture.
Learning Pro OCR Basics Recognizing the Document The third step is to actually convert or recognize the text in a document. Pro OCR reads the text and displays the actual characters. Before recognizing the document, you should specify the quality of the image text. You can do this by using the Recognize drop-down list. file:///C|/VisioneerDoc/html/02learn.
Learning Pro OCR Basics To recognize the document: 1. Select Degraded or Fax Quality from the Recognize button drop-down list. 2. Click the Recognize button in the Gallery toolbar. Pro OCR displays a bar that moves through the document as Pro OCR recognizes the text. When the process finishes, you see the document with text only. file:///C|/VisioneerDoc/html/02learn.
Learning Pro OCR Basics Proofing the Document After a document is recognized it appears in the text view. In this view, you can proof the document for errors and make changes to the document when you find problems. When you proof, you can: ■ ■ ■ Inspect recognized text and edit it if necessary. Search for misspelled words, numbers, punctuation, symbols, and alphanumeric words. Change font style information. NOTE: You can change the proofing options by choosing Options from the Tools menu.
Learning Pro OCR Basics Pro OCR displays the next suspect entry. 4. Repeat the previous steps until you have checked the entire document. 5. If you want to change the font style, select the text, and click the Style option. Saving the Document Saving the document places a permanent copy of it on disk. To save the document: 1. Choose Save from the File menu, or click the Save As button in the Gallery toolbar. The Save As dialog box appears. 2. Type the file name in the File Name box. 3.
Learning Pro OCR Basics Locate Text Only prevents Pro OCR from locating any picture element in the document to be scanned. 3. Select Use Scanner from the Get Page drop-down list in the Gallery toolbar. 4. Click Auto OCR in the Gallery toolbar. Your scanner software dialog box appears. 5. Use the scanner software as you usually do to scan the document. After scanning the sample document, the document appears in Pro OCR. A dialog box appears asking for additional pages to scan.
Learning Pro OCR Basics While Pro OCR recognizes the page, notice the boxes indicating located text regions around each column, and the arrows connecting each text region to the next. Note that by using Locate Text Only, the graphic element in the sample was not located and so a box does not appear around it. Pro OCR outputs text in the order in which the arrows connect the text regions. For this example, notice how the boxes are drawn and connected. file:///C|/VisioneerDoc/html/02learn.
Learning Pro OCR Basics When Pro OCR finishes recognizing, the Recognition Completed dialog box appears. 7. Click OK. The document appears in the text view. To save the document 1. Choose Save As from the File menu, or click the Save As button in the Gallery toolbar. The Save As dialog box appears. 2. Select Pro OCR from the Save As Type drop-down list. The Pro OCR format saves all available information in the document. 3. Type in a name for the file in the File Name box. 4. Click Save.
Learning Pro OCR Basics Documents.” Example 4: Scanning a Document With Tables and Saving in a Spreadsheet Format This example introduces you to processing of multi-column text in tables, where you want the text to be recognized as all one text block and not broken into columns. You can use this procedure whenever you want to recognize tables and other documents that you don’t want broken into columns. To scan multicolumn table text and save in spreadsheet format: 1.
Learning Pro OCR Basics Notice that the text regions are not drawn separately around each column. By using the Single Column locating method, you force Pro OCR to ignore columns and tell it to read the page from left to right, top to bottom. When Pro OCR is finished recognizing the page, the Recognition Completed dialog box appears. 6. Click OK. file:///C|/VisioneerDoc/html/02learn.
Learning Pro OCR Basics Pro OCR displays the document in the text view. To save the document: 1. Choose Save As from the File menu, or click the Save As button in the Gallery toolbar. The Save As dialog box appears. file:///C|/VisioneerDoc/html/02learn.
Learning Pro OCR Basics 2. Choose Microsoft Excel from the Save as Type drop-down list. Notice that the following options are already selected. TIP: To change these options, click the Options button. 3. Type in a name for the file in the File Name box. 4. Click Save. Pro OCR saves the text and format information of the document in the format you have selected. 5. Choose Close from the File menu. NOTE: If you don’t save a version of this file in the Pro OCR format, you cannot open it again in Pro OCR.
Learning Pro OCR Basics After scanning the sample document, it appears in the Pro OCR window. Pro OCR begins getting the page from the scanner. When the scanning is done, a dialog box appears asking if you want to scan additional pages. For this example, you won’t be scanning any additional pages. 5. Click End. Automatic processing continues with the Locate and Recognize steps. file:///C|/VisioneerDoc/html/02learn.
Learning Pro OCR Basics The Recognition Complete dialog box appears. 6. Click OK. The document appears in the text view. Notice that the graphic image appears and has a picture region drawn around it. To save the document: 1. Choose Save As from the File menu, or click the Save As button in the Gallery toolbar. The Save As dialog box appears. 2. Choose Rich Text Format (RTF) from the Save as Type drop-down list. RTF allows you to save the pictures along with the text in the exported file.
Learning Pro OCR Basics 6. Click Save. The picture from the scanned page is now saved within the RTF file along with the recognized text. If you open this file in a word processor that supports pictures in RTF files, you see the recognized text and the pictures. 7. Choose Close from the File menu. Example 6: Locating a Document Using a Template At times, you don’t want to recognize all the text on a page.
Learning Pro OCR Basics 3. In the Temp folder, find and select the file TEMPB.TPL. 4. Click Open. Pro OCR displays the name of the template you selected next to Template in the Locate drop-down list. 5. Select Open File from the Get Page drop-down list. 6. Click the Get Page button in the Gallery tool bar. 7. In the Pro OCR directory, select the file SAMPLEB.TIF and click the Get button. The sample file is read in. 8. Click the Locate button.
Learning Pro OCR Basics copyright in the footer were not recognized. If you save this page in an application or text format, only the displayed text is saved. 10. Save and close the document. Use the same procedures described in the earlier examples.
Learning Pro OCR Basics 3. Press and hold the mouse button; then drag down and to the right until the box following the pointer encloses all of the column headers. 4. Release the mouse button. You have just manually located a text region. 5. Move the pointer just above and to the left of the item labeled “Gold.” 6. Press and hold the mouse button; then drag down and to the right until the box following the pointer encloses the first column of the table.
Learning Pro OCR Basics 9. Using the same steps you used to create the text regions, drag the mouse until the box following it encloses all three columns of numbers and release the mouse button. Make sure the entire image of the number columns is enclosed by the new region you have defined. 10. Choose Numeric from the Style menu. The locate region you just defined becomes a numeric region. To make a table from the selected regions: 1. Choose Select All from the Edit menu.
Learning Pro OCR Basics You have completed this example. A message appears asking if you want to save the document. 4. Choose Close from the File menu. Close the document without saving it. © Copyright 1998 Visioneer, Inc. Reach us at www.visioneer.com. file:///C|/VisioneerDoc/html/02learn.
Getting Documents Chapter 3 Getting Documents This chapter tells you how to get (acquire) documents with Pro OCR. It is assumed that you completed the procedures in “Starting Pro OCR,” and “Selecting a TWAINCompliant Scanner,” in Chapter 2. In this chapter you learn: ■ The basic steps for getting a page ■ How to get a page using a scanner ■ How to get a page from a file Getting a Page—The Basic Steps There are two ways to get a page: 1) Use Auto OCR to automatically get a page.
Getting Documents Getting Pages From a Scanner You can use a scanner to get one page at time by using the Get Page button, or use a scanner with Auto OCR to get multiple pages automatically. This section tells you how to: ■ Set scanning options ■ Get one page using Get Page ■ Get pages with Auto OCR Setting Scanning Options To set scanner settings for your scanner, such as the resolution, brightness, and page orientation, see the documentation that came with your scanner.
Getting Documents To set Get Page Processing options: 1. Choose Options from the Tools menu. The Options dialog box appears with the Processing tab selected. 2. Select the options that you want to use. 3. Click OK. Selecting a Scanner as the Source When you get pages from a scanner by using Auto OCR, a deferred job, or Get Page, one or more page images are read in from the scanner.
Getting Documents NOTE: If you did not previously select a scanner, the Select Scanner dialog box appears, letting you select one now. (You can also select a scanner by choosing Select Scanner from the Tools menu.) Getting a Page Using a Scanner During the single-step Get Page operation, you scan only one side of one page at a time. You cannot automatically read stacks of pages or double-sided pages. Instead, you must manually feed pages that you want to be read.
Getting Documents Pro OCR scans the page on the flatbed or the first page in the ADF, using the current brightness, page size, orientation, and scanning resolution settings. After the single page is read in, it appears using the previous magnification. NOTE: To find the most appropriate brightness setting for a page, use Get Page to scan the same page as many times as necessary. You can change the level of brightness in your scanner’s software. To scan additional pages: 1.
Getting Documents Page drop-down list. 2. Check the Locate and Recognize options to make sure they are set the way you want them. 3. Place the first page on the flatbed. Make sure the page is oriented correctly for your scanner and the page orientation you have selected in the Gallery. 4. To scan more than one page, choose Options from the Tools menu, and then select the Enable Auto OCR Dialogs processing option. 5. Click Auto OCR. The scanner software appears. 6. Use the software as you usually do.
Getting Documents If the Enable Auto OCR Dialogs processing option is not selected, scanning is completed. Pro OCR begins locating and then recognizing. If the Enable Auto OCR Dialogs processing option is selected, Pro OCR asks for additional pages to scan after it finishes reading in the current page: file:///C|/VisioneerDoc/html/03get.
Getting Documents 7. If you want to get additional pages, place another page on the flatbed. Pro OCR scans the additional page on the flatbed and displays the dialog box again, asking for the next page. Repeat this step for as many additional pages that you want to scan. 8. If you do not want to scan more pages, click End. Scanning is completed. Pro OCR displays the page you’ve scanned in the image view. Pro OCR then begins locating and then recognizing.
Getting Documents 5. Click Auto OCR. Pro OCR begins getting pages. If the Enable Auto OCR Dialogs processing option is not selected, scanning is completed. Pro OCR begins locating and then recognizing. If the Enable Auto OCR Dialogs processing option is selected, Pro OCR asks for additional pages to scan. 6. If you want to scan another stack of pages, place the next stack of pages in the ADF.
Getting Documents Scanning is completed. Pro OCR finishes getting pages and displays the first page of the scanned stack in the image view. The scanned double-sided text is correctly sequenced, in correct page order. Getting Pages from an Image File Typically, Pro OCR obtains the image of a page by working directly with your scanner. You can, however, also use Pro OCR with image files you scanned or created using other applications.
Getting Documents 1. Select Open File from the Get Page drop-down list. A checkmark appears next to it when selected. 2. Click the Get Page button in the Gallery toolbar. The Get Page dialog box appears. 3. Select the file and click Get. The file is read in and the progress bar moves down the page. Getting Files From Other Scanner Applications Pro OCR supports many of the most popular scanners directly.
Getting Documents 4. Click Auto OCR. 5. Find and select the file(s) that you want to process. 6. Click Add and then click Get. Pro OCR automatically processes the image file(s) according to the controls in the Locate and Recognize rows of the Gallery. OR 1. Click Get Page. 2. Find and select the file that you want to process. 3. Click Add then click Get. Pro OCR reads in the specified file.
Getting Documents You can specify one or more image files for the Get Page step, and then have Pro OCR automatically locate and recognize them. If you’ve selected the Enable Auto OCR Dialogs processing option, you can also select one or more additional files after reading the initial files and before locating and recognizing begin. Pro OCR can process most standard black and white TIFF, PCX, and DCX files. To automatically process from a file: 1.
Getting Documents can add available files from as many directories and disks as necessary. Files are displayed in the Selected Files list in the order in which you add them. NOTE: To remove a file from the Selected list, select the file name and click the Remove button. To remove all selected files, click Remove All. 5. Click Get. Pro OCR reads in the selected file or files. As a page is read in, the Get Page button is highlighted, and the progress bar moves down the page.
Getting Documents More About Enabling Auto OCR Dialogs By default, after you’ve used Auto OCR to scan pages or to read in one or more files, Pro OCR displays a dialog box that prompts you to continue in one of several ways: ■ Scan another page or stack of pages ■ Scan the second side of a page or stack ■ Open additional files This lets you read in and process multiple files or stacks of pages as one document.
Getting Documents 2. To enable the dialogs, select Enable Auto OCR Dialogs. To disable the dialog boxes, deselect the option. © Copyright 1998 Visioneer, Inc. Reach us at www.visioneer.com. file:///C|/VisioneerDoc/html/03get.
Saving and Printing Documents Chapter 6 Saving and Printing Documents This chapter describes the input file formats and output file formats that Pro OCR supports and tells you how to save documents in a variety of these formats.
Saving and Printing Documents The Save As dialog box appears: If the document has been saved previously, the name of the document is displayed and selected in the File Name box. If the document has not already been saved, the File Name box is selected and contains the default file name: UNTITLED.XXX. Pro OCR adjusts the file extension represented here as XXX according to the document format you select in the Save as Type dropdown list. 2. Type a new file name, if necessary. 3.
Saving and Printing Documents ■ Standard text file formats ■ Word processor and spreadsheet file formats For more information about the different file formats, see “Supported Output File Formats” later in this chapter. 4. If you want to save any pictures in the document, select the Save Pictures option and choose a picture format from the Picture Format drop-down list. NOTE: Saving pictures in a document is different from saving the entire page image.
Saving and Printing Documents currently chosen in the Save as Type drop-down list, click the Options button to open the Save As Options dialog box. Most formats have additional options. If there are no options available for the format you’ve selected, the Options button is dimmed.
Saving and Printing Documents ■ Point size ■ Justification ■ Number of columns ■ Line spacing ■ Paragraph indentation ■ Page size ■ Margin sizes Choose one of the Split Document options to either keep all pages in one file or split the document into multiple files: ■ ■ All Pages in One File: Choose this option to save all the pages in the document in one file. Split on Blank Pages: Choose this option when you want Pro OCR to save a stack of documents into separate files.
Saving and Printing Documents one image page per file. If you save in PCX format, Pro OCR automatically selects this option, because a PCX file can only have one page. When you use this option, Pro OCR automatically creates one file for each page. Pro OCR saves each file using the name you specified followed by a sequential three-digit numeric identifier, followed by the appropriate extension.
Saving and Printing Documents 2. Process the pages as you would normally. 3. When you save the document, choose Save As from the File menu. The Save As dialog box appears. 4. Click the Options button. The Options dialog box appears. 5. Select the Split on Blank Pages option and click OK. Pro OCR saves each stack of pages up to a blank page as a separate file, using the name you specified followed by a sequential numeric identifier, followed by the appropriate extension.
Saving and Printing Documents 5. Click OK. Saving Multiple Page Images as Separate Image Files In addition to Pro OCR format, you can save a document in a number of image output formats. Usually, you’ll save a copy of your document in one of these graphic formats when the document you’re processing has illustrations that you want to save and use in other applications.
Saving and Printing Documents To save a template: 1. Choose Save Template As from the File menu. 2. Enter a file name. 3. Choose the format from the Save as Type drop-down list, and then click Save. You can open a saved template by double-clicking the Template button or by choosing Select Template from the File menu. Saving Settings The current settings are remembered when you open Pro OCR again.
Saving and Printing Documents formats. Pro OCR can save to a variety of output file formats at various stages of processing.
Saving and Printing Documents NOTE: If you don’t have any of the applications listed here, note that many word processor and spreadsheet applications can handle formats from other word processors and spreadsheets. Most Windows word processors can import RTF files, although some have only limited support for RTF. Internally, Pro OCR preserves all the format, character style, and font information of the input page.
Saving and Printing Documents The current state of each page in the document is saved, including any locate regions or recognized text. Saving to Pro OCR Deferred Format The Pro OCR Deferred file format is a special case of the Pro OCR file format. Use it to save work in progress so that you can open the document later for further single-step processing (using the Open command in the File menu) or to complete processing (using the Process Deferred Jobs command in the File menu).
Saving and Printing Documents suspect and illegible characters if there are any left, check spelling, and search for numbers, punctuation, symbols, and alphanumeric words. However, because you haven’t saved the page image, you can’t use the On-Screen Verifier. Text files take up a lot less space than image files. Image files are large, even when compressed. Saving to Standard Image File Formats You can save a document in a variety of TIFF formats and in PCX format.
Saving and Printing Documents preserved. When you output a recognized document in Plain Text format, the text is sequentially output in the order in which the text blocks were located. Margins and columns are not preserved. ■ ■ ■ ■ ■ ■ Text with Line Breaks. Preserves text, tabs, and a carriage return at the end of each line. No page formatting, character style, or font information is preserved.
Saving and Printing Documents font, character spacing, and line length information. ■ Hyper Text Markup Language (HTML). Inserts HTML tags to format the document for viewing in an HTML browser. Saving to Application Formats When you save to a specific application format, by default Pro OCR saves as much of this format, character, and font information as possible. You can also choose to discard all formatting information, or customize the formatting that is saved with the document.
Saving and Printing Documents Doesn’t Support If you have a word processor that Pro OCR does not support directly, try saving your document in one of the other Pro OCR word processor export formats. In addition, most word processors can import RTF files, although some have only limited support for RTF.
Saving and Printing Documents Saving Pictures During the Locate and Recognize steps, if Locate Text and Pictures has been selected, Pro OCR processes any pictures, or other nontext information on the input page, as embedded graphic images. When you save to a graphic output file format or to a word processor format that supports embedded pictures, and you select the Save Pictures option in Save As, Pro OCR saves these embedded graphic images.
Saving and Printing Documents 5. Click OK. © Copyright 1998 Visioneer, Inc. Reach us at www.visioneer.com. file:///C|/VisioneerDoc/html/06save.
Locating Text and Graphics Chapter 4 Locating Text and Graphics A locate region identifies an area of a page image to be recognized. You define locate regions in the image view using Pro OCR’s locating procedures.
Locating Text and Graphics Numeric Regions A numeric region is a locate region that Pro OCR recognizes as numbers (0–9) or one of the symbols shown in the following table. Table 4-1: Numeric Symbols + - ¥ / = * “ % $ # £ ¢ ¥ , E e ( ) { } .
Locating Text and Graphics region are recognized as numbers and not mistaken for letters. You can define numeric regions manually or with a template. Pro OCR does not define numeric regions automatically. You can also redefine a selected numeric region as any other kind of locate region using the Style menu or the Style ribbon. Picture Regions A picture region is a locate region that contains any kind of graphic, illustration, photograph, drawing, or picture.
Locating Text and Graphics Typically, you locate a table by putting a single text or numeric region around all of the columns of the table. However, if you have a table where some columns are text and some columns are numeric, you may want to use the Make Table command. Make Table allows you to select different types of regions and then combine them into one object so that the text is exported into a tabular format, rather than columns.
Locating Text and Graphics Deciding When to Use Multiple Columns or Single Column Only Depending on the content of the page, you can organize the actual flow of the text in different ways.
Locating Text and Graphics How to Locate Text and Picture Regions Locating is typically done after getting a page and before recognizing. You select a locating method to tell Pro OCR how to define and order locate regions on a page. Pro OCR uses the selected locating method with automatic processing and when you click the Locate button. You can also locate regions manually. For more information, see “Defining Locate Regions Manually,” later in this chapter. To locate text or picture regions: 1.
Locating Text and Graphics TIP: To select White Out Text, choose Options from the Tools menu, and then select White Out Text in Pictures in the Processing Options. Locating with a Template If the locating method you selected, Multiple Columns or Single Columns Only, doesn’t work exactly as you want, you can manually create the appropriate locate regions on a page.
Locating Text and Graphics 3. Click the Locate button in the Gallery toolbar. Pro OCR locates the document. 4. Manually adjust the locate regions. For example, to adjust the region size, such as to exclude text, click the border of the text region and drag to include or exclude text. To delete a text region, select the border of the region and press the Delete key. To apply a region type, such as numeric or picture to a region, select the region border, and then click a Region Type in the Gallery toolbar.
Locating Text and Graphics The Select Template dialog box appears. 3. Find and select the template that you want to use. 4. Click Open. Pro OCR displays the name of the template you selected next to Template in the Locate drop-down list. 5. Get the document using the Get Page button in the Gallery toolbar. 6. Click Auto OCR or if you want click Locate and Recognize buttons in the Gallery toolbar to manually locate and recognize information.
Locating Text and Graphics In the image view, the order of locate regions is shown by arrows from the center of one locate region to the top-center of the next locate region. This sequence tells Pro OCR in what order it should process the regions: You can manually change the order of locate regions that either you or Pro OCR have defined. The order of locate regions defines the sequence in which the information on the page is processed and output to a file.
Locating Text and Graphics The order of the paragraphs (and the flow of the text) on the page might be as shown in Example 5-1 or as shown in Example 5-2: When the order of text regions is defined as in Example 5-1, the text is output to the word processor as in Example 5-3.
Locating Text and Graphics Processing Resumes For Pro OCR to process resumes and legal documents properly, select Single Columns Only from the Locate drop-down list in the Gallery toolbar. Resumes often contain formatting elements that can be difficult for an OCR program to interpret, such as numerous indentations, bulleted items, and a wide mixture of both justified and centered text.
Locating Text and Graphics About Columns, Locate Regions, and Output File Formats Pro OCR preserves virtually all page layout and text flow information in the documents it processes. However, when you save to a specific word processor format, Pro OCR preserves only as much of this layout information as the particular application format is designed to use. Some applications can use (interpret, display, and print) more of this kind of information than others.
Locating Text and Graphics ■ Create tables. You can use manually located regions to create a template, just as you can create templates from automatically located text regions. As with all other locating procedures, you can only manually define locate regions in the image view by selecting Image from the View menu. When you manually locate, you specify the size and extent of one or more locate regions on the page using the mouse.
Locating Text and Graphics Numeric region Hold down the Ctrl key as you drag the cross hair pointer across the page image. Picture region Hold down the Ctrl + Shift keys, as you drag the cross hair pointer across the page image. As you drag, a box is drawn from the corner where you started to the cross hair of the pointer. When the box encloses the desired text, release the mouse button. A box is then displayed with sizing handles in each corner and at the center of each side.
Locating Text and Graphics Overlapping Locate Regions and Skewed Text When you manually create text or numeric regions, you should be aware of the following constraints. If a text or numeric region cuts through a character, only the part of the character that is within the region is located. When these cutoff characters are recognized, most of them will be illegible.
Locating Text and Graphics These constraints are especially important when pages are skewed (read in crooked). Because locate regions are defined by rectangles that are square to the screen, when you have skewed text in a document, you may have to overlap text or numeric regions in order to not cut off any lines and get all the text into the appropriate region. When this happens, if the locate regions are close together, sometimes a line ends up being contained within more than one text or numeric region.
Locating Text and Graphics For more information about Straighten Skewed Images, “Setting Scanning Options,” in Chapter 3. Selecting and Deselecting Locate Regions You can only select locate regions in the image view. You select a locate region to change its kind, delete it, or resize it. When any locate region is selected, sizing handles appear: To select a single locate region: 1. Move the pointer over the locate region. When the pointer is over a locate region, it is the standard arrow pointer. 2.
Locating Text and Graphics To deselect one or more locate regions while keeping the rest selected: ■ Move the pointer over a selected locate region and shift-click. The locate region you clicked in is deselected, but all other selected locate regions stay selected. Repeat this step for each locate region you want to deselect. To select all locate regions: ■ Choose Select All from the Edit menu. All locate regions defined for the page are selected.
Locating Text and Graphics want to define a different locate region that includes the image in that region. NOTE: Only the defined locate region is deleted, not the underlying image. The underlying image never gets deleted. To delete a locate region: 1. Select the locate region to be deleted. 2. Press Delete to remove the selected locate region, or choose Clear from the Edit menu to remove the selected locate region. The box around the image of the text disappears.
Locating Text and Graphics The same conditions on the size, overlap, containment, and extent of locate regions apply to a resized locate region as to a newly created locate region. Reordering Locate Regions You can only reorder locate regions in the image view. Whenever you create a locate region, Pro OCR automatically links all locate regions on the page in sequence. You reorder locate regions when you want to change the automatic sequence in which they are processed and output.
Locating Text and Graphics To relink (an example): 1. Move the pointer over locate region #1 and press and hold down the mouse button. 2. Drag the pointer into locate region #2, then release the mouse button. The arrow originally leading into locate region #2 disappears, and a new arrow connects locate region #1 to locate region #2: © Copyright 1998 Visioneer, Inc. Reach us at www.visioneer.com. file:///C|/VisioneerDoc/html/04locate.
Setting Recognize Options and Proofing a Recognized Document Chapter 5 Setting Recognize Options and Proofing a Recognized Document When you recognize a document you convert an image into editable text. You can then proof and edit the text. This chapter tells you how to: ■ ■ Select the type quality option for recognizing. Select display options, including the fonts that Pro OCR uses when recognizing text and when displaying the recognized document, suspect threshold level, and illegible character symbol.
Setting Recognize Options and Proofing a Recognized Document To select the type quality for recognizing: ■ Select a type quality from the Recognize drop-down list in the Gallery toolbar. Selecting Display Options Use the Options dialog box to select options that tell Pro OCR how to recognize a document and display the results.
Setting Recognize Options and Proofing a Recognized Document Pro OCR recognizes and identifies over 2,000 typefaces. It can make correct judgments about character identity even when a character isn’t absolutely clear. However, sometimes Pro OCR cannot identify with certainty what a particular character is, and other times Pro OCR cannot identify a character at all. To handle cases like these, Pro OCR tries to assign the correct character to a questionable character image.
Setting Recognize Options and Proofing a Recognized Document Option Does this... Stringent Suspect Threshold Identifies ALL suspect characters. Use the stringent setting when it is important that you know about all possible mistaken identifications, or when using dictionaries will not aid in identification. For example, use Stringent when you recognize tables of numbers, documents with a lot of proper names, and whenever you need to check the recognition results very carefully.
Setting Recognize Options and Proofing a Recognized Document The choices for the illegible character symbol include: ~ @ ^ # * The preset symbol is “~”. Every illegible character is represented by the same selected illegible character symbol. Choose a symbol that you otherwise don’t expect to have in your document, so that when you search for it you will only find the illegible characters. For example, you would not use the “#” sign if your document has tables with the “#” sign in them.
Setting Recognize Options and Proofing a Recognized Document you’ll always have the same fonts installed in your system as the fonts identified in the input document. To maintain as much similarity to the input document as possible, Pro OCR maps any identified fonts to three user-selectable fonts installed in your system: one monospaced font, one serif font, and one sans serif font.
Setting Recognize Options and Proofing a Recognized Document TIP: You can display and proof a document using one display font or set of display fonts, and then change the settings to save the document with other fonts. This is one reason for saving a separate settings file. Indicating Whether Pictures Appear During Text View Use the Display Pictures option to tell Pro OCR whether to display pictures in the text view. If you deselect this option, a blank box appears in place of the pictures in text view.
Setting Recognize Options and Proofing a Recognized Document You can locate region automatically using Locate with any of the locate settings, or they can be located manually. For more information about locating, see Chapter 4, “Locating Text and Graphics.” 2. Select either Letter Quality, Dot Matrix Quality, or Degraded or Fax Quality from the Recognize drop-down list in the Gallery toolbar. For most documents, you’ll select Letter Quality.
Setting Recognize Options and Proofing a Recognized Document Setting the Zoom Levels The zoom controls are active in both the image view and the text view. Use them to change between zoom levels. You cannot zoom in closer than the pixel-for-pixel level (in the image view), or 400% (in the text view), or zoom out farther away than 25% in either view. When you’re at the maximum zoom level, the zoom in control is dimmed. When you’re at the file:///C|/VisioneerDoc/html/05recog.
Setting Recognize Options and Proofing a Recognized Document minimum zoom level, the zoom out control is dimmed. To zoom in or out: ■ Click the Zoom In or Zoom Out icon on the Status bar. Selecting a Page to Display The page controls are available in both the image view and the text view. The page number box in between the page controls tells you what page of the open document is being displayed and how many pages there are in the document.
Setting Recognize Options and Proofing a Recognized Document The requested page appears. The page number box changes to the new page number. Selecting Text or Image View The View controls are current in both the image view and the text view. Use them to change between the image view and the text view. Pro OCR highlights the selected button to indicate which view you’re currently in. To change views: ■ Click the Image View icon or the Text View icon.
Setting Recognize Options and Proofing a Recognized Document ■ Edit a document Selecting Proofing Options Set Pro OCR Proof options to indicate if you want to proof whole lines and what combinations of words and punctuation you want to proof. To select Proofing options: 1. Click the down arrow next to the Proof button in the Gallery toolbar. The Options dialog box appears with the Proofing options displayed. 2. Select one of these options. ■ ■ Whole Lines.
Setting Recognize Options and Proofing a Recognized Document 3. (Optional) If you select Combination Of, select any of the following options: ■ ■ Proof Suspect and Illegibles. Pro OCR selects each suspect or illegible character as it is encountered. Note that Pro OCR uses the selected suspect threshold display option to decide which characters are suspect. Proofing Punctuation and Symbols. Pro OCR searches for and selects each punctuation mark or symbol as it is encountered.
Setting Recognize Options and Proofing a Recognized Document character, the suspect or illegible character is visited and selected first, and the next time you use Proof, the same word is selected again. To proof a document: 1. In the text view, click the Proof button, or choose Proof from the Recognize menu. TIP: You can also press the Tab key to start the proof. Pro OCR starts at the current insertion point, if there is one. Otherwise, it starts at the top of the current page.
Setting Recognize Options and Proofing a Recognized Document Replace to make changes. TIP: If the selected text is misspelled, and you expect to find further instances of the word in this document, don’t edit it. Instead, use Find & Replace. The selected word is displayed as the Find text. You can type in the correct spelling in the Replace text box and change all instances of the word, either with Replace or Replace All. NOTE: When you add a word to the current user dictionary, it’s available immediately.
Setting Recognize Options and Proofing a Recognized Document ■ Click the text view button in the Status bar, or by choosing Text from the View menu. To edit text within a line: 1. Move the pointer over the text line. The pointer indicates the text selection. 2. Click anywhere within the line. The line becomes active for editing. The blinking vertical bar cursor indicates where the insertion point is on the line. Clicking anywhere outside the active line deactivates it.
Setting Recognize Options and Proofing a Recognized Document Cut, copy, paste, clear Use keyboard equivalents or click the right mouse button to select or deselect characters one at a time. Hold down the Shift key while using the arrow keys Lines don’t wrap when more characters are added to a line. Instead, text is squeezed into the existing line, squeezing the space between characters and words proportionally and overlapping them if necessary.
Setting Recognize Options and Proofing a Recognized Document Each time you select another line, a box is drawn around it. The previously selected lines stay selected. The lines don’t have to be next to one another to be selected. 4. Repeat steps 2 and 3 for each additional text line you want to select. OR 1. Move the pointer outside all text lines. When the pointer is outside all text lines, it is the standard arrow pointer. 2. Click the mouse button and drag diagonally.
Setting Recognize Options and Proofing a Recognized Document selected. To deselect all text lines: 1. Move the pointer outside all text lines. When the pointer is outside all text lines, it is the standard arrow pointer. 2. Click the mouse button. All selected text lines are deselected. To delete one or more text lines: 1. Select the text lines to be deleted. 2. Press Delete to remove the selected text lines. OR Choose Clear from the Edit menu to remove the selected text lines.
Setting Recognize Options and Proofing a Recognized Document You can only apply a text style in the text view. You may apply a text style to any selected text. Text can be styled with any combination of Bold, Italic, and/or Underline. All text from the selected lines is changed to the selected style. 3. Repeat Step 2 for each additional style to be applied. 4.
Setting Recognize Options and Proofing a Recognized Document When you use Proof with the Misspelled Words proofing option selected, Pro OCR searches for words that are not in the General dictionary, the current user dictionary, or any supplemental Pro OCR dictionary in the dictionaries directory. Pro OCR selects the first candidate word it finds after the insertion point or the start of the current page.
Setting Recognize Options and Proofing a Recognized Document To create a user dictionary: 1. Choose Select User Dictionary from the Tools menu. The following dialog box appears: 2. Type in the name of the new dictionary. 3. Click OK. The new dictionary is created and automatically selected. To select a user dictionary: 1. Choose Select User Dictionary from the Tools menu. The Select User Dictionary dialog box appears. 2. Find the dictionary you want to open and select it.
Setting Recognize Options and Proofing a Recognized Document 3. Click OK. The current user dictionary (and any changes you make to it) is used until you choose a different one. To add to the User Dictionary while editing in the text view: 1. Select a different user dictionary, if necessary, by choosing Select User Dictionary from the Tools menu. NOTE: Make sure you have a user dictionary open. Add to User Dictionary is only available when there’s a current user dictionary. 2.
Setting Recognize Options and Proofing a Recognized Document © Copyright 1998 Visioneer, Inc. Reach us at www.visioneer.com. file:///C|/VisioneerDoc/html/05recog.
Table of Contents Contents Chapter 3: Getting Documents Getting a Page—The Basic Steps Getting Pages From a Scanner Setting Scanning Options Selecting a Scanner as the Source Getting a Page Using a Scanner Using Auto OCR with Scanners Getting Pages from an Image File Selecting a File as the Source and Getting Pages Getting Files From Other Scanner Applications Getting Fax-modem Files Using Auto OCR With a File More About Enabling Auto OCR Dialogs Glossary file:///C|/VisioneerDoc/html/toc3.
Table of Contents Contents Chapter 4: Locating Text and Graphics Kinds of Locate Regions Text Regions Numeric Regions Picture Regions Tables Pro OCR’s Locating Methods Locating Text and Pictures Locating with a Template Order of Locate Regions Examples of Locating Documents Processing Resumes Processing Legal Documents Processing Faxed Documents About Columns, Locate Regions, and Output File Formats Defining Locate Regions Manually Tips When Creating Locate Regions Overlapping Locate Regions and Skewed Tex
Table of Contents Selecting and Deselecting Locate Regions Changing the Kind of a Locate Region Deleting a Locate Region Resizing a Locate Region Reordering Locate Regions Glossary file:///C|/VisioneerDoc/html/toc4.
Table of Contents Contents Chapter 5: Setting Recognize Options and Proofing a Recognized Document Selecting Type Quality Options Selecting Display Options Setting the Suspect Character Threshold Setting the Illegibles Character Symbol Selecting a Display Font Indicating Whether Pictures Appear During Text View Recognizing a Single Page Working with Recognized Pages in Text view Setting the Zoom Levels Selecting a Page to Display Selecting Text or Image View Proofing Selecting Proofing Options Proofing a D
Table of Contents Checking Spelling in a Document Adding Words to a User Dictionary Displaying a Summary of Recognized Errors Glossary file:///C|/VisioneerDoc/html/toc5.
Table of Contents Contents Chapter 6: Saving and Printing Documents Saving Documents and Other Pro OCR Items Saving a Document Saving Templates Saving Settings Supported Output File Formats Saving to Proprietary Pro OCR Formats Saving to Standard Image File Formats Saving to Generic Text File Formats Saving to Application Formats Format Suppression and Customizing Exporting to a Word Processor that Pro OCR Doesn’t Support Saving Pictures Printing a Document Glossary file:///C|/VisioneerDoc/html/toc6.
Table of Contents Contents Chapter 7: Creating and Processing Deferred and Batch Jobs The Advantages of Finish and Deferred Processing Guidelines for Using Finish Processing and Deferred Processing How it Works Setting Up and Processing Deferred Jobs Processing Deferred Jobs Batch Processing Glossary file:///C|/VisioneerDoc/html/toc7.
Creating and Processing Deferred and Batch Jobs Chapter 7 Creating and Processing Deferred and Batch Jobs This chapter tells you how to process Deferred, Finish, and Batch jobs. Finish Processing lets you combine the efficiency of multi-step automatic operation with the power and flexibility of single-step interactive operation. You can process pages in your document according to their specific characteristics, while still having automatic processing available for the rest of the pages in the document.
Creating and Processing Deferred and Batch Jobs Locate, or Recognize setting for the different pages in a document. You’ll also use these processes with a mixture of settings or when more than one person works on the documents or more than one workstation is used.
Creating and Processing Deferred and Batch Jobs Use Create Deferred Job to get pages and save them in the Pro OCR Deferred format for processing later on. After you create a deferred job, you can use any combination of locating and recognizing on some or all pages and then save the document. When you’re ready to finish processing the saved document, use Process Deferred Jobs to automatically perform any additional processing. To create a deferred job: 1.
Creating and Processing Deferred and Batch Jobs from any directory or disk. 4. Type in a new file name. 5. Click Save. If Open File is selected in Get Page as the source to get pages from, the Auto Get Page dialog box appears. If Use Scanner is selected as the source, Pro OCR immediately starts to scan. 6. Scan the documents, or if your are getting a file, select a file in the Auto Get Page dialog box, and then click the Get button.
Creating and Processing Deferred and Batch Jobs click the Get button. The Get Page process is the same as when you’re using Auto OCR with either a scanner or a file. 7. When you’re finished getting pages, click Finished. The pages are read in the same way that they are when you use Auto OCR. When all pages are read in, a dialog box tells you the process is completed. 8. Click OK. The pages are saved to the file you named previously.
Creating and Processing Deferred and Batch Jobs Deferred jobs are saved in the Pro OCR Deferred format with image, locate regions (if any), and recognized text (if any). 3. Select the file you want to process and click Get. To select multiple files, click the Advanced button, choose a file, and click Add. Repeat this process until you select all files that you want to get, then click the Get button.
Creating and Processing Deferred and Batch Jobs NOTE: The Process Deferred Jobs command does not process non-Pro OCR image files. If some of your files could not be processed, read them in again (using Get Page, Auto OCR or Create Deferred Job) and process them as you normally would. The last page of the document is displayed at the last selected zoom level in the text view. 4. Click OK. You can now proof the document (press Tab), and edit it as needed.
Creating and Processing Deferred and Batch Jobs Batch Process allows you to specify the source directory that contains image files, image file type, destination directory where the recognized results are saved, and the export Format. Pro OCR automatically performs the OCR job on each image file under the source directory, and exports the results to the destination directory. To process as a batch: 1. Select Locate and Recognize options from the drop-down lists in the Gallery toolbar. 2.
Creating and Processing Deferred and Batch Jobs The export format determines the saved format and the extension of all of the files included in this Batch Process. Batch Process names each file by combining the file name of the image and the default extension name of the export format. For example, if the image file’s name is sample.tif, and you choose Plain Text as the export format, the result file is sample.txt.
Table of Contents Contents Chapter 8: Tips for Getting the Best Results Fixing Broken and Touching Characters Adjusting Brightness for Consistent Documents Handling Documents That Are Not Consistent Processing Documents with Different Page Sizes or Orientations Processing Documents with Different Character Quality Converting Parts of a Page in a Multipage Document Changing the Gallery Options Using Get Page Again Using Locate Again Using Recognize Again Finding and Replacing Recognized Text Making Sure Pag
Tips for Getting the Best Results Chapter 8 Tips for Getting the Best Results This chapter provides tips for getting the best results from Pro OCR by: ■ Fixing broken and touching characters ■ Adjusting the brightness to obtain consistent documents ■ Processing inconsistent documents ■ Changing a setting after completing autoprocessing ■ Getting the best recognition ■ Making sure page images are not skewed ■ Using numeric regions when you’re recognizing numeric text ■ Putting pages in the sc
Tips for Getting the Best Results ■ When characters are light or broken, use a lower (darker) setting. However, when there are both broken and touching characters on the same page or in the same document, trying to fix one problem may make the other problem worse. In such a situation, you’ll usually find that it works to use this rule: ■ When there are both broken and touching characters, use a lower (darker) setting—that is, fix the broken characters.
Tips for Getting the Best Results You may have to experiment with different settings. If your scanner supports Auto brightness, you may want to try it first before setting brightness manually. 2. Click the Get page button in the Gallery, or choose Get Page from the Process menu. 3. Get the file that you want to adjust. 4. Zoom in on the page to check the image’s quality. A good image has characters that are not too dark or touching and are not too light or broken. If the image looks good, skip to step 10.
Tips for Getting the Best Results that you begin to make the characters too light and/or broken. Handling Documents That Are Not Consistent Sometimes the pages in your document are not consistent, for example, do not have the same page size. To handle this, you must change the Gallery options for each page. In such cases, use Get Page, Locate, or Recognize in combination with Process Deferred Jobs or Finish Processing.
Tips for Getting the Best Results to be processed. 2. Click Get Page to get the page, or choose Get Page from the Process menu. 3. Repeat Steps 1 and 2 for each page in the document. 4. Choose Finish Processing from the Process menu. Make sure you select the Locate and Recognize options in the Gallery toolbar. OR Save the document in Pro OCR Deferred format. If you save the document in the Pro OCR Deferred format, you can finish processing it by choosing Process Deferred Jobs later.
Tips for Getting the Best Results 5. To scan the page again with a different brightness setting, delete the page. 6. Increase or decrease the brightness. If the page image contained dark or touching characters, increase brightness. If the page image contained light or broken characters, decrease brightness. If the page image has a “noisy” (fuzzy, dotty) background (as in some multigeneration photocopies and some faxes), increase brightness in order to “fade out” the background “noise.” 7.
Tips for Getting the Best Results 1. Choose Create Deferred Job from the Process menu. The Create Deferred Job dialog box appears. 2. Select the files you want to process. Create Deferred Job lets you scan a stack of pages or read in a set of image files. If you have only one page, or you want to retrieve and process a single file, you can use Get Page. Remember to set the appropriate Get Page options. 3. Manually locate the locate regions on the current page. 4.
Tips for Getting the Best Results 2. Determine which Locate and Recognize options in the Gallery apply to the majority of pages. For example, the locating options might be Locate Text Only and Single Columns Only. You’ll use these settings in step 4. 3. Use Locate and Recognize, as necessary, on each of the other pages. In other words, you’ll leave the “majority pages” alone and only use Locate—or Locate and Recognize—on the pages that are not “majority pages.” 4.
Tips for Getting the Best Results Recognize over again. Using Get Page Again You may want to use Get Page again if you scan pages in with an incorrect Page Size or Orientation setting, or if you didn’t use an appropriate brightness or scanning resolution setting. You can use Get Page again at any step in the Pro OCR process. If you get a page, the page is added after the current page. If the page’s quality is not good, you can delete it and redo the steps.
Tips for Getting the Best Results Using Recognize Again This may be necessary if the text on a page was not recognized accurately because of an incorrect type quality setting. You can recognize again at any step in the Pro OCR process. To recognize the current page again: 1. Select the appropriate Recognize options from the Recognize drop-down list. 2. Click the Recognize button, or choose Recognize from the Process menu. You can use recognize again for individual pages in the current document.
Tips for Getting the Best Results 2. Click the Proofing tab, and select the following options: Suspects (Normal), Illegibles, and Misspelled Words. 3. Click OK. 4. Choose Proof from the Process menu. When Proof selects the character to replace, select the word that contains the suspect character or illegible character you want to replace. 5. Choose Find & Replace from the Edit menu. The dialog box is displayed with the selected text. 6. Type the correct text in the Replace box. 7.
Tips for Getting the Best Results Even with good quality characters on good quality paper, Pro OCR will have trouble locating and recognizing accurately if the type in the page image is skewed (crooked). This can happen either because the text is crooked on the page or because the page is scanned at an angle. What’s important is that the text image cannot be skewed more than 2° for Pro OCR to have accurate recognition. The illustration to the left shows a page that has 2° skew.
Tips for Getting the Best Results Avoiding Markings on Pages Handwritten notes on pages may slow down recognition. You can reduce the effect of markings on pages by: ■ ■ Scanning the document first and then marking it up, or making a photocopy for scanning before you mark it up. Using whiteout to remove any markings that don’t overlap text. Be very careful, however, about using whiteout on text—you may make the text even more illegible.
Index Index A accuracy of recognition ADF and Auto OCR All Pages in One File (Split Document options) application formats for saving Auto Get Page dialog box (1) Auto Get Page dialog box (2) Auto OCR from a file from a scanner with a flatbed with an ADF scanner auto orientation B Batch Process dialog box batch processing explanation selecting brightness, adjusting broken characters, fixing C character quality, processing different Create Deferred Job dialog box D DCX file format file:///C|/VisioneerDoc
Index deferred processing advantages continuing job creating job explanation guidelines setting up Degraded or Fax Quality command deleting locate regions dictionary adding words creating General user See also user dictionary directories deferred jobs dictionaries discarding format when saving Display Options command (1) Display Options command (2) Display Options command (3) Display Pictures option E editing all lines applying styles copying deleting text deselecting in text view more than one line singl
Index exporting to unsupported word processor F fax example faxed document processing fax-modem files features file formats input DCX fax-modem files files from other scanner applications PCX TIFF output Pro OCR Pro OCR Text Only spreadsheet standard text word processor File menu Process Deferred Job command (1) Process Deferred Job command (2) Save As command File Properties dialog box file, getting multiple files from other scanner applications finding and replacing recognized text finish processing adv
Index retrieving saving source controls (1) source controls (2) type quality controls Get Info get page basic steps files from unsupported scanners from file from scanner getting fax-modem files getting multiple files (1) getting multiple files (2) one scanned page scanning additional pages setting options (1) setting options (2) setting options (3) single-step operation using Auto OCR with files Get Page dialog box (1) Get Page dialog box (2) Go to Page dialog box H hints HTML (1) HTML (2) I illegible c
Index L legal document processing locate regions changing the kind of defining manually defining the order deleting kinds of legal document example locating manually method to use numeric order of overlapping regions and skewed text overlapping text and pictures picture redefining reordering resizing resume example selecting and deselecting single or multiple columns tables text text and pictures (1) text and pictures (2) tips using a template M magnifying the view misspelled words N normal suspect thres
Index numbers and alphanumeric words Numeric Region icon numeric regions O One Page Per File option (Split Document options) On-Screen Verifier example of use (1) example of use (2) showing in Text View turning on or off opening a file Optical Character Recognition (OCR) defined uses for Options dialog box Display options Process options (1) Process options (2) Proof options order of locate regions overlapping text and pictures P Page controls (Status bar) (1) Page controls (Status bar) (2) Page Image Ro
Index PCX file format Picture Region icon picture regions defined white out text pictures, saving preserving format when saving printing Pro OCR file format (1) Pro OCR file format (2) Pro OCR Text Only file format Pro OCR window Process Deferred Job command (File menu) (1) Process Deferred Job command (File menu) (2) Process Deferred Jobs Complete dialog box Processed Deferred dialog box processing options (1) processing options (2) Proof command proofing combinations of characters and words misspelled wo
Index resume processing retrieve settings rotate RTF S Save As command (File menu) Save As dialog box Save As Options Save As Options dialog box saving a template as HTML as plain text as RTF as speadsheet as text for database for spreadsheet for wordprocessor Gallery settings multiple documents as separate files pictures pictures (example) to application formats to generic text file format to MS Word (example) to Pro OCR (example) to Pro OCR deferred format to Pro OCR format to Pro OCR text only to propr
Index scanning additional pages one page second side selecting a scanner setting options with Auto OCR and ADF with Auto OCR and scanner with flatbed Select Source dialog box Select Template dialog box Select User Dictionary dialog box selecting a scanner single-step operation get page locate Recognize when to use skewed images adjusting for straightening source selecting file selecting scanner source controls (Gallery) (1) source controls (Gallery) (2) speed of recognition spellcheck Split Document option
Index T tables defined scanning mixed single column template creating saving selecting using (1) using (2) using (3) text applying styles copying deleting deselecting regions selecting all lines selecting more than one line selecting single line Text Region icon Text Region icon text view editing operations editing text editing within a line selecting Text View icon TIFF tips for locating toolbar tutorial scanning a document using a template scanning a document with mixed tables scanning a document with ta
Index U user dictionary adding words adding words in text view creating selecting V view changing displaying pages zooming in and out view controls (1) view controls (2) Visioneer format W White Out Text option (1) White Out Text option (2) Wizard word processor exporting to unsupported saving to Z Zoom controls (Status bar) zoom in and out Zoom In and Zoom Out icons file:///C|/VisioneerDoc/html/ix.