User’s Guide
COPYRIGHT INFORMATION Copyright © 1999 by ScanSoft, Inc. All rights reserved. No part of this publication may be transmitted, transcribed, reproduced, stored in any retrieval system or translated into any language or computer language in any form or by any means, mechanical, electronic, magnetic, optical, chemical, manual, or otherwise, without the prior written consent of ScanSoft, Inc., 9 Centennial Drive, Peabody, Massachusetts 01960. This digital version contains the full copyrighted text.
CONTENTS PREFACE About This User’s Guide ............................. vii Organization of this user’s guide ....................viii Documentation conventions.........................ix Related Documentation ............................... x Technical Support ...................................xi 1 INTRODUCTION TO TEXTBRIDGE Basic OCR Concepts ............................... 1–1 Features and Benefits .............................. 1–3 New Features ................................. 1–4 Enhanced Features .
2 INSTALLING AND SETTING UP TEXTBRIDGE What Comes with TextBridge ........................ 2–2 Supported Scanners................................ 2–2 Installing and Testing Your Scanner ................... 2–4 System Requirements .............................. 2–5 Before Installing TextBridge ......................... 2–6 Using TextBridge with Pagis ...................... 2–6 Uninstalling a Previous Version of TextBridge ........ 2–6 Learning about TextBridge .......................
4 LEARNING TO USE TEXTBRIDGE Before You Begin.................................. 4–2 Ways You Can Use TextBridge ....................... 4–2 Starting TextBridge................................ 4–3 Using Automatic Processing ......................... 4–5 Using Manual Processing ........................... 4–8 Performing Basic Operations......................... 4–9 Selecting the Page Source ....................... 4–10 Selecting the Page Type ........................ 4–11 Previewing the Page ................
ADVANCED SAMPLE SESSIONS Session 1: Processing a Document to Use in a Database..... 6–1 Session 2: Using Zone Templates and Page Types ......... 6–7 Session 3: Training TextBridge OCR .................. 6–14 Where to Go From Here............................
PREFACE ScanSoft, Inc. welcomes you to TextBridge Pro 9.0 for Windows® 95, 98, 2000, and Windows NT 4.0. (Subsequently referred to as “TextBridge.”) The documentation that comes with TextBridge should provide all the information you need to operate TextBridge. The documentation includes this user’s guide, a Help system, and Release Notes. ScanSoft invites your comments about the information provided in the documentation.
To view the user’s guide you need Adobe Acrobat Reader which is installed with TextBridge unless you already have it on your PC. You can access the user’s guide from the installation menu and the TextBridge Program menu from the Start menu, or you can open it from Adobe Acrobat Reader. After you open it, you can view it on your PC and print all or part of it using Adobe Acrobat Reader.
Documentation conventions TextBridge documentation uses certain graphical elements and formatting to emphasize information and give more meaning to text. Table 1: Documentation Conventions bold Introduces a new term or the first use of an important term in a chapter. Sometimes used to denote strong emphasis.
RELATED D OCUMENTATION TextBridge provides a comprehensive set of printed and digital documentation designed to assist you in learning and operating the product. The documentation provided with TextBridge covers all aspects of installation and operation. Note Information provided in these documents is not duplicated in other documents except for basic information about TextBridge. If you do not find the information you want in a particular document, please check another.
TECHNICAL SUPPORT If you should experience problems with TextBridge that you cannot resolve on your own using the documentation and software, contact TextBridge Technical Support at the following Web site: www.scansoft.com. The ScanSoft Web site provides a link to TextBridge pages, including Technical Support with Frequently Asked Questions, technical information bulletins, and a problem report form.
1 INTRODUCTION TO TEXTBRIDGE Welcome to ScanSoft’s TextBridge ™ Pro 9.0, optical character recognition (OCR) software for Microsoft Windows™ 95, 98, 2000 and Windows NT 4.0.
You can use TextBridge to scan and convert printed pages to text documents for your word processor, spreadsheet program, web browser, database program, or other text application. Pages may be from most sources, including computer printers, fax machines, photocopiers, magazines, and newspapers. Pages can be black and white or color. TextBridge can also recognize standard page image files from fax modems, image applications, and other sources.
In most cases, TextBridge understands your original document’s format and maintains the layout, including columns, headers, footers, pictures, and picture captions. Pictures can be black and white, grayscale, or color. Recomposition is possible only if your text program supports pictures and layout. For example, recomposition is supported in Microsoft Word and Corel WordPerfect but not in Notepad.
New Features TextBridge offers these major new features to increase your productivity: 1–4 ◆ Improved OCR accuracy. Dramatically save time and eliminate retyping. ◆ Color and grayscale pictures and text. Recognition and output of color and grayscale pictures. Recognition of color text and text on a color or shaded background and output of black on white or white on black. ◆ Improved table recomposition. Advanced analytical capability results in improved table reformatting.
◆ TextBridge Assistant. An easy-to-use assistant, guides you through each step of the most common TextBridge activities, such as how to scan a page and send it to Word, recognize an image file, and recognize just part of a page. ◆ Improved batch processing. The ability to select multiple files and process each file separately plus the ability to schedule processing for a specific time in the future. ◆ Integration with e-mail programs.
Enhanced Features In addition to the new features, TextBridge offers enhanced features that were available in previous versions. These features are described in the following list: ◆ Instant Access™. Start TextBridge within most Windows text programs such as Word or Excel. After recognizing and converting the page, TextBridge then automatically pastes recognition data (text and pictures) directly into the program’s open document. ◆ ToolTips.
Retaining pictures is independent of retaining layout. Some text programs retain pictures even though they do not retain layout. ◆ Page Types. TextBridge provides many predesigned Page Types to make processing more efficient. You do not have to go through a complicated process of determining and specifying settings for common types of pages. These Page Types automatically provide appropriate settings for the type of page you want to process.
Other Features In addition to the features listed in the previous sections TextBridge provides these other features. 1–8 ◆ Windows 98 and 2000 compatibility. ◆ Broad scanner support. TextBridge supports most popular desktop scanners with TWAIN device interface standard. ◆ Image processing. TextBridge accepts a wide range of images from a variety of sources for processing.
◆ Preview of page images. TextBridge provides a set of tools for previewing page images before processing them. You can manually define areas of page images as zones to be processed and capture only the text, tables, or pictures you want. You can also edit the automatic zoning by adjusting the text, table, and picture zones. ◆ Zone templates. After you create a set of zones, TextBridge lets you save and reload zone templates for new jobs.
DOCUMENTS TEXTBRIDGE CAN RECOGNIZE TextBridge includes a number of advances developed by ScanSoft, Inc. and at the Xerox Palo Alto Research Center (PARC). Consequently, TextBridge provides highly accurate OCR and format retention on the widest range of documents.
INPUT I MAGE FILE FORMATS The source of page images for TextBridge can be your scanner or it can be image files. TextBridge can recognize the following types of image file formats: Image File Format File Name Extension Windows bitmap .bmp PCX .pcx Multi-page PCX used in some fax programs .dcx Tag image file format (including Alacrity TIFF) .tif, .ala Delrina WinFax fax image files .fxr, .fxd, fxm, .fxs eXtended image file .xif Image files can be black and white (binary), grayscale, or color.
OUTPUT TEXT FILE FORMATS TextBridge can convert its recognized text and pictures to files for the following programs and formats: Programs and Formats Ami Pro 2.0 and 3.0 dBase IV DisplayWrite 5 Excel 97 and 2000 Excel 3.0, 4.0, and 5.0 Excel for the Macintosh 3.0 to 7.0 FrameMaker HTML WYSIWYG HTML Interleaf Lotus 1-2-3 Lotus Word Pro MultiMate Advantage II PostScript Professional Write 2.0 and 2.
Programs and Formats Word 6.0 and 7.0 (RTF) Word 97 and 2000 (RTF) WordPerfect 4.2 and 5.1 Word Perfect 6.0, 6.1, 7.0, and 8.0 WordStar Works File Name Extension .doc .doc .wpf .wpd .wsd .rtf ☞ Microsoft Word (RTF) format is also accepted by a number of other applications, including ClarisWorks® and Adobe® PageMaker®, and WordPad. See the documentation for your particular application for more information about importing files in RTF format. Note Refer to the ScanSoft Web site at www.scansoft.
WHERE TO GO FROM HERE To learn how to install and set up TextBridge on your system, go to Chapter 2. To learn how TextBridge recognizes a document and how you prepare TextBridge to do this, read Chapter 3. This chapter explains the basic concepts and functions of the software. To learn how you use TextBridge to process simple and complex documents, refer to Chapter 4. It also explains how to view, zone, train, and proofread your document in TextBridge and edit your document in your word processor.
2 INSTALLING AND SETTING UP TEXTBRIDGE This chapter describes the TextBridge software installation and setup procedures.
WHAT COMES WITH TEXTBRIDGE TextBridge comes with the following items: ◆ One installation CD-ROM. The CD-ROM includes software programs, language packs, sample document image files, release notes, Help files, online user’s guide in Adobe PDF format, and Adobe Acrobat Reader. ◆ A printed user’s guide to get you started. Check to be sure that you have all the items listed above. If any item is missing from your TextBridge package, call your authorized ScanSoft dealer.
Depending upon the design of your TWAIN driver, you may not be able to scan in color with TextBridge. If you have a triple-pass scanner, use it in single pass, black and white mode only. If you have a Visioneer sheetfed scanner, use the Visioneer Paperport software and drag and drop an image onto TextBridge or your word processor. ☞ An ISIS driver will be installed by TextBridge Pro 9.0 to support the Hewlett-Packard Scanjet 5100C model scanners.
INSTALLING AND TESTING YOUR SCANNER Refer the to manufacture's detailed instructions for installing your scanner. They provide the most precise information for setting up your scanner. The basic steps for installing a scanner are: 1. Install the correct scanner interface card (if one is necessary) in the PC bus. Note that many scanners simply plug into the PC’s parallel port, universal serial bus (USB), or occasionally the standard serial port. 2.
SYSTEM REQUIREMENTS To install and run TextBridge, your Windows-compatible PC must be equipped with the following: ◆ An Intel (or compatible) 80486 or Pentium™ microprocessor. We recommend Pentium for the best performance. ◆ A VGA, SVGA, or multi-sync color monitor. ◆ A minimum of 24 megabytes (MB) of random access memory (RAM). We recommend 32 MB for the best performance. ◆ Microsoft Windows™ 95, 98, or 2000 or Windows NT 4.0.
BEFORE INSTALLING TEXTBRIDGE After you install your scanner and check that it is working properly, you are ready to complete other preparations for installing TextBridge and learn more about TextBridge. Using TextBridge with Pagis The Pagis program from ScanSoft is a color scanning suite of software that enables you to scan, copy, fax, view and edit, index, search, and manage electronic documents and includes TextBridge.
To restore your PC to the state it was in before you installed the previous version of TextBridge, use the following procedure: 1. Close all active applications, including TextBridge. 2. On the Windows task bar, click Start. 3. Point to Programs, then point to the TextBridge folder. 4. Click TextBridge Uninstall. The TextBridge Uninstall dialog box appears. 5. Click Yes to continue the uninstall process. TextBridge proceeds with the uninstall. When it is finished, the Uninstall Complete dialog box appears.
Learning about TextBridge When you insert the TextBridge CD-ROM into your CD-ROM drive, an autorun program on the CD-ROM launches TextBridge setup. After setup starts, select one of the options in the following list: 2–8 ◆ Install TextBridge Pro 9.0. The setup program begins for you to install the components of TextBridge. ◆ View Release Notes. The Release Notes appear for you to read and review before you install TextBridge.
INSTALLING TEXTBRIDGE This section provides procedures to install TextBridge. Note If you want TextBridge to run on more than one version of Windows with a dual boot system, install TextBridge separately under each operating system. Before you begin installation, quit any open applications so that only Windows is running. If you typically run programs in the background, close them as well. There should be no applications listed in the task bar and no floating toolbars on the Windows desktop.
2. Click Install TextBridge Pro 9.0. 3. If you have Pagis 2.0 or later installed, a message asks if you want Pagis to use TextBridge Pr 9.0. Click Yes to use TextBridge Pro 9.0 or No to use your current version of TextBridge with Pagis. 4. Read the information in the Welcome dialog box, then click Next. 5. Read the Software License Agreement, then click Yes to proceed with the installation. Click No if you do not accept the license agreement. The TextBridge installation ends without installing TextBridge.
When you select Typical, TextBridge uses the language of your PC’s user interface as the default language to OCR your documents. It also installs English, French, German, Italian, and Spanish recognition language support. When you select Custom, you can install additional recognition languages. Select only those languages that you will want because each language requires additional disk space on your hard drive.
If your PC is not set up for electronic registration, please fill in the registration information and use the print or fax option to send it to ScanSoft. Registration helps you if you need to contact Technical Support and keeps you up-to-date about TextBridge and the ScanSoft family of programs. Click Cancel in the ScanSoft TextBridge Product Registration dialog box if you do not want to register now. If you do not complete the registration at this time, TextBridge reminds you to register after two weeks.
1. On the Windows task bar, click Start. 2. Point to Programs, then point to the TextBridgePro 9.0 folder, and then point to the Setup folder. 3. Click Scanner Setup. ☞ Scanner Setup is also available from the TextBridge Tools menu. Follow the instruction in the Scanner Setup wizard to install or test your scanner setup. SETTING UP I NSTANT ACCESS TO TEXTBRIDGE Instant Access enables you to use TextBridge directly from a number of other programs, such as Word.
To provide Instant Access to TextBridge from an application, use the following procedure: 1. On the Windows task bar, click Start. 2. Point to Programs, then point to the TextBridgePro 9.0 folder, and then point to the Setup folder. 3. Click Instant Access Control Panel. The TextBridge Instant Access Control Panel dialog box appears. TextBridge automatically lists the programs from which Instant Access is available as well as any programs that are currently open.
UNINSTALLING TEXTBRIDGE To restore your PC to the state it was in before you installed TextBridge, use the following procedure: 1. Close all active applications, including TextBridge. 2. On the Windows task bar, click Start. 3. Point to Programs, then point to the TextBridgePro 9.0 folder, and then point to the Setup folder. 4. Click TextBridge Uninstall. The TextBridge Uninstall dialog box appears. 5. Click Yes to continue the uninstall process. Click No to quit the uninstall process. 6.
WHERE TO GO FROM HERE To learn how TextBridge recognizes a document and how you prepare TextBridge to do this, read Chapter 3. This chapter explains the basic concepts and functions of the software. To learn how you use TextBridge to process simple and complex documents, refer to Chapter 4. It also explains how to view, zone, train, and proofread your document in TextBridge and edit your document in your word processor. Chapters 5 and 6 provide sample sessions.
2 INSTALLING AND SETTING UP TEXTBRIDGE This chapter describes the TextBridge software installation and setup procedures.
WHAT COMES WITH TEXTBRIDGE TextBridge comes with the following items: ◆ One installation CD-ROM. The CD-ROM includes software programs, language packs, sample document image files, release notes, Help files, online user’s guide in Adobe PDF format, and Adobe Acrobat Reader. ◆ A printed user’s guide to get you started. Check to be sure that you have all the items listed above. If any item is missing from your TextBridge package, call your authorized ScanSoft dealer.
Depending upon the design of your TWAIN driver, you may not be able to scan in color with TextBridge. If you have a triple-pass scanner, use it in single pass, black and white mode only. If you have a Visioneer sheetfed scanner, use the Visioneer Paperport software and drag and drop an image onto TextBridge or your word processor. ☞ An ISIS driver will be installed by TextBridge Pro 9.0 to support the Hewlett-Packard Scanjet 5100C model scanners.
INSTALLING AND TESTING YOUR SCANNER Refer the to manufacture's detailed instructions for installing your scanner. They provide the most precise information for setting up your scanner. The basic steps for installing a scanner are: 1. Install the correct scanner interface card (if one is necessary) in the PC bus. Note that many scanners simply plug into the PC’s parallel port, universal serial bus (USB), or occasionally the standard serial port. 2.
SYSTEM REQUIREMENTS To install and run TextBridge, your Windows-compatible PC must be equipped with the following: ◆ An Intel (or compatible) 80486 or Pentium™ microprocessor. We recommend Pentium for the best performance. ◆ A VGA, SVGA, or multi-sync color monitor. ◆ A minimum of 24 megabytes (MB) of random access memory (RAM). We recommend 32 MB for the best performance. ◆ Microsoft Windows™ 95, 98, or 2000 or Windows NT 4.0.
BEFORE INSTALLING TEXTBRIDGE After you install your scanner and check that it is working properly, you are ready to complete other preparations for installing TextBridge and learn more about TextBridge. Using TextBridge with Pagis The Pagis program from ScanSoft is a color scanning suite of software that enables you to scan, copy, fax, view and edit, index, search, and manage electronic documents and includes TextBridge.
To restore your PC to the state it was in before you installed the previous version of TextBridge, use the following procedure: 1. Close all active applications, including TextBridge. 2. On the Windows task bar, click Start. 3. Point to Programs, then point to the TextBridge folder. 4. Click TextBridge Uninstall. The TextBridge Uninstall dialog box appears. 5. Click Yes to continue the uninstall process. TextBridge proceeds with the uninstall. When it is finished, the Uninstall Complete dialog box appears.
Learning about TextBridge When you insert the TextBridge CD-ROM into your CD-ROM drive, an autorun program on the CD-ROM launches TextBridge setup. After setup starts, select one of the options in the following list: 2–8 ◆ Install TextBridge Pro 9.0. The setup program begins for you to install the components of TextBridge. ◆ View Release Notes. The Release Notes appear for you to read and review before you install TextBridge.
INSTALLING TEXTBRIDGE This section provides procedures to install TextBridge. Note If you want TextBridge to run on more than one version of Windows with a dual boot system, install TextBridge separately under each operating system. Before you begin installation, quit any open applications so that only Windows is running. If you typically run programs in the background, close them as well. There should be no applications listed in the task bar and no floating toolbars on the Windows desktop.
2. Click Install TextBridge Pro 9.0. 3. If you have Pagis 2.0 or later installed, a message asks if you want Pagis to use TextBridge Pr 9.0. Click Yes to use TextBridge Pro 9.0 or No to use your current version of TextBridge with Pagis. 4. Read the information in the Welcome dialog box, then click Next. 5. Read the Software License Agreement, then click Yes to proceed with the installation. Click No if you do not accept the license agreement. The TextBridge installation ends without installing TextBridge.
When you select Typical, TextBridge uses the language of your PC’s user interface as the default language to OCR your documents. It also installs English, French, German, Italian, and Spanish recognition language support. When you select Custom, you can install additional recognition languages. Select only those languages that you will want because each language requires additional disk space on your hard drive.
If your PC is not set up for electronic registration, please fill in the registration information and use the print or fax option to send it to ScanSoft. Registration helps you if you need to contact Technical Support and keeps you up-to-date about TextBridge and the ScanSoft family of programs. Click Cancel in the ScanSoft TextBridge Product Registration dialog box if you do not want to register now. If you do not complete the registration at this time, TextBridge reminds you to register after two weeks.
1. On the Windows task bar, click Start. 2. Point to Programs, then point to the TextBridgePro 9.0 folder, and then point to the Setup folder. 3. Click Scanner Setup. ☞ Scanner Setup is also available from the TextBridge Tools menu. Follow the instruction in the Scanner Setup wizard to install or test your scanner setup. SETTING UP I NSTANT ACCESS TO TEXTBRIDGE Instant Access enables you to use TextBridge directly from a number of other programs, such as Word.
To provide Instant Access to TextBridge from an application, use the following procedure: 1. On the Windows task bar, click Start. 2. Point to Programs, then point to the TextBridgePro 9.0 folder, and then point to the Setup folder. 3. Click Instant Access Control Panel. The TextBridge Instant Access Control Panel dialog box appears. TextBridge automatically lists the programs from which Instant Access is available as well as any programs that are currently open.
UNINSTALLING TEXTBRIDGE To restore your PC to the state it was in before you installed TextBridge, use the following procedure: 1. Close all active applications, including TextBridge. 2. On the Windows task bar, click Start. 3. Point to Programs, then point to the TextBridgePro 9.0 folder, and then point to the Setup folder. 4. Click TextBridge Uninstall. The TextBridge Uninstall dialog box appears. 5. Click Yes to continue the uninstall process. Click No to quit the uninstall process. 6.
WHERE TO GO FROM HERE To learn how TextBridge recognizes a document and how you prepare TextBridge to do this, read Chapter 3. This chapter explains the basic concepts and functions of the software. To learn how you use TextBridge to process simple and complex documents, refer to Chapter 4. It also explains how to view, zone, train, and proofread your document in TextBridge and edit your document in your word processor. Chapters 5 and 6 provide sample sessions.
3 OCR AND BASIC TEXTBRIDGE OPERATIONS This chapter provides information about the process of page recognition. Use this chapter to learn about optical character recognition (OCR), page recognition, recomposition, and operations that will help you use TextBridge effectively including automatic and manual processing and page types and settings for recognition.
WHAT IS TEXTBRIDGE OCR? TextBridge is OCR software that turns paper documents or page image files into text documents on your PC. Page image data is electronic information about the pages of a document that comes from a source such as your scanner or fax software. This data becomes an image document and is stored in an image file. Text documents are files containing information about the text and pictures in your document.
Page Type Scan Size Print Type Page Layout Fax Legal Letter Magazine (b & w) Magazine (color) Newspaper Letter Legal Letter Letter Fax Good Good Good Any Single column Single column Multi-column Table Letter Good Multi-column A3 News Multi-column paper Letter Good Table Picture Output Gray B&W B&W Gray Color Gray Gray Figure 3–1. Original Page tab in Page Type Settings dialog box The page type also specifies Scanner Settings controlling how pages of this type will be scanned.
☞ Scanning grayscale (or color) rather than black and white can improve text recognition on pages with difficult-to-recognize text. However, grayscale scanning is slower than black and white scanning. Page sources You can get pages to process from your scanner or from page images.
In addition, some complex, free-form layouts defeat TextBridge’s recomposition capabilities. For these types of documents, it is often best to preview pages and manually zone text and image zones that you want to capture. Retain pictures keeps pictures in the saved document if the document format supports pictures. If you do not select retain page layout, pictures are saved at the end or beginning of the document, depending on your word processor.
RUNNING TEXTBRIDGE STANDALONE AND I NSTANT ACCESS You can run TextBridge as a standalone program or invoke it from within another program with Instant Access. You can also invoke TextBridge through image file context menus and dragand-drop. Note Instant Access is also available from the Start menu. Standalone Program The TextBridge standalone program is a conventional, document-oriented Windows program.
Instant Access Instant Access runs more automatically than TextBridge standalone with a minimal, dialog box-based user interface. The entire document is processed with little intervention by you. Instant Access gives you direct access to TextBridge from programs such as Word and WordPerfect. Programs with Instant Access have a TextBridge command in the File menu. Clicking TextBridge in the File menu starts TextBridge, which recognizes pages and pastes them directly into the open document in the program.
The programs in the following list do not have Instant Access capability: Acrobat Exchange Acrobat Reader Clipboard Viewer Corel Quattro Pro File Manager HotMetal Light Netscape Netscape Editor IMPROVING PAGE RECOGNITION WITH S ETTINGS There are a number of settings that you select in TextBridge at the beginning of the recognition process to help it recognize a document with more accuracy. Many of these options are related to the manual processes described in the previous section.
Figure 3–2. Original Page tab in Page Type Settings dialog box This dialog box has three tabs: Original Page, Scanner, and Processing. Each lets you view or change Page Type settings. Original Page Settings On the Original Page tab, you can choose the following settings: ◆ Set the page orientation for the way text and images are printed on the original page: • Any orientation • Portrait • Landscape If you select Any orientation, TextBridge automatically determines the page orientation.
◆ Select the page layout of the original page: • Any layout • Single column • Multi-column • Table • As zoned by template When you select Any layout, TextBridge automatically determines the page layout. Use Any layout when pages in your document have different layouts or when your pages have complex layouts that do not fit the above layouts. Select Table for pages with a table or spreadsheet and singlecolumn text.
Scanner Settings You can view and change the settings for your scanner in the Scanner tab of the Page Type Settings dialog box (Figure 3–3, next page). On the Scanner tab you can set: ◆ ◆ Original Page quality: • Good print • Difficult or degraded Picture Output: • • • Black and White Gray Color Figure 3–3.
◆ TextBridge determines the best scan resolution and color for the Original Page and Picture Output settings. Click Custom if you want to override this default scan resolution setting. ◆ Set the scan page size to reflect the actual size of the original page. ◆ To override the default scan Brightness setting, uncheck Adjust Automatically and move the slider.
On the Processing tab: ◆ Select the primary language of the document. If you select more than one language, they all must be in the same language group. You cannot change the language group after you begin processing a document. ◆ Select the user dictionary you want used when processing pages. You can add technical terms and proper names to a user dictionary during proofreading and training. The user dictionary assists TextBridge in recognizing words it does not know.
☞ For Auto Save and Send To, use the Auto Save Settings dialog box available from the Process menu to make these settings. You can view and change the settings for the output document in the Save As dialog box, each time you save a document. Except for the File name, these settings are “sticky” and do not change from document to document, unless you change them. When you save a document, you can change the settings to be sure they are the best ones for your document.
◆ Specify where you want to save the results of document processing. ◆ Specify the type of format in which to save the results from the list of options. ◆ Specify the default name of the scanned document to save. The default name is from text at the top of the first page recognized, or type in another name, if desired. If you are saving more than one document, each document has the same base name appended with an integer in parentheses. For example, ScanSoft, Inc., (2).
Language Installation When you install TextBridge, you select one or more languages to use. If a language you want is not available at that time, check the TextBridge Web site to see if additional languages are available. TextBridge assumes your PC has the fonts needed to display text in the recognized language. If your PC does not have an available font for the code page of the recognized text, a message informs you and suggests that you load a font for the code page.
The following items describe methods for recognizing multiple languages in the same document: ◆ Document Language Group Before you begin to process any pages, you can change the Language Group using the Document Language Group drop down list in the Processing tab of the Page Type Settings dialog box. However, once you have a page in your document, the language group control is disabled and you cannot change the language group.
TextBridge assumes that all text and table zones are in the languages that you have specified for the document. You can change the language of the selected zone, table, or table cells from the document language to any other language in the same language group. Right-click the zone and click Properties in the context menu. In the Properties dialog select the language for this zone using the language drop down list.
4 LEARNING TO USE TEXTBRIDGE The previous chapters have introduced you to TextBridge and document recognition. This chapter describes what you can do with the most basic capabilities of TextBridge.
BEFORE YOU BEGIN The following checklist will take you through the most important questions to ask before you start to process a document. 1. Is this document a good candidate for OCR? If you have difficulty reading a page, TextBridge may also have trouble recognizing it. 2. Is my document coming from my scanner or an image file? 3. What type of page is the document? 4. Do I want to retain the original layout and format of the document? 5.
TextBridge provides flexibility in performing the steps of the OCR process. You can: ◆ Process your pages automatically or interact with processing in manual mode ◆ Specify the type of page to optimize processing settings ◆ View and mark parts (zones) of pages to be recognized ◆ View and manipulate the pages of a document with page thumbnails ☞ A thumbnail is a small image representation of your document.
To start TextBridge: 1. On the Windows task bar, click Start. 2. Point to Programs, then point to the TextBridge Pro 9.0 folder. 3. Click the TextBridge Pro 9.0 icon. The TextBridge main window appears (Figure 4–1). Menu Bar Main toolbar Process toolbar View area showing welcome Thumbnail area Tip and Status bar Figure 4–1. TextBridge Pro main window Note 4–4 The Welcome message appears when you start TextBridge for the first time, and each time until you uncheck Show this welcome when starting.
USING AUTOMATIC PROCESSING When you use TextBridge’s automatic processing feature, TextBridge processes pages with very little of your interaction. In automatic mode, after you select the page type and page source, TextBridge automatically recognizes your page(s). TextBridge only stops for you to add more pages and to save the results of recognition. TextBridge also allows you to automatically save and open the recognized documents in another application, such as a word processor or editor.
Click Auto button Figure 4–2. Click the Auto button in the TextBridge window 2. If scanning, you may do the following: Click the More Pages button in the Add More Pages to Scanner dialog box (Figure 4–3) to scan another page. or Click the Other Side button to scan the other side of two-sided pages. or Click the Done button when there are no more pages to add. TextBridge recognizes the text, saves any pictures to be placed in your output, and remembers the format for your output.
Click Done to proceed when all pages are scanned Click to scan more pages Click to scan second side(s) of a two-sided document Figure 4–3. Add Pages to Scanner dialog box 3. Save the text with any picture(s) in a file format of your choice. Figure 4–4. Save the document using the Save As dialog box 4. Proofread the results of the OCR process, in your selected application, or process a new document.
USING MANUAL PROCESSING TextBridge enables you to get remarkably accurate results from page recognition. Page recognition is a complex process, and with some documents it can require your interaction with TextBridge to get the best output. Using manual processing, you will find a number of opportunities during page recognition that allow you to enhance the results for the particular document. In manual mode, you step TextBridge through document processing.
3. View and zone the page images. Click Find Zones to have TextBridge automatically find text, tables and pictures on the page or use the zoning tools to mark the zones yourself. 4. Click the Recognize button. TextBridge recognizes the page, including text, picture, and format. ☞ To recognize all your pages at once, click the Recognize button drop-down arrow to be sure it is set to recognize all pages. 5. Proofread the results of recognition. Correct any errors using the tools in the Text view. 6.
Selecting the Page Source Before you start processing a new document, you can indicate whether pages are from your scanner or an image file. To do so, click the drop down arrow on the Get Pages button to select the source of the page image: your scanner, scanner feeder, or image file (Figure 4–5). Click the drop down arrow to display page image source options Figure 4–5. Select the Page image source Note 4–10 Some scanners have a scanner feeder in which you can place a stack of pages to be scanned.
Selecting the Page Type For best OCR results and performance, you can select the Page Type that best matches your original page(s). Page types make it easy for you to select the best settings for processing your pages. A page type encapsulates all the processing settings. Most documents can be processed using the default setting, Any Page (b&w). If you wish to select a different setting, click the Page Type button and select a different page type (Figure 4–6).
Figure 4–7. Change settings for this page type TextBridge provides page types for the most common types of pages. You can also define your own page types with settings optimized for other specialized types of documents. Previewing the Page When manually processing, TextBridge displays the image of each page in the Image view (Figure 4–8). Processing stops after TextBridge gets pages and displays the image of the original page.
◆ Delete the page from the document. ◆ Add more pages to the document. ◆ Cancel the process by creating a new file or opening another file. ◆ Look at the properties of the page. ◆ Continue processing the page. You can use the Image Tab toolbar or View and Page menu commands to examine and orient the acquired page. Preview tools Figure 4–8.
Zoning the Page Before recognizing text on a page, TextBridge finds the text, table, and picture areas (or zones) on the page (Figure 4–9). TextBridge does this automatically when processing in Automatic mode. In Manual mode, you can mark the zone yourself or click Find Zones to have TextBridge automatically zone the page. A zoned page is divided into one or more zones. There are three types of zones: text, table, and picture.
You can use Find Zones to generate zones automatically. Then, you can adjust these zones before continuing the zoning process and recognizing the page. You can also manually zone the page. Use the text marker, table marker, picture marker, and erase marker zoning tools in the Image toolbar like highlighting markers to create and adjust zones. Find zones Manual zoning tools Highlighted zones Figure 4–9. Zone the page using the Zoning tools TextBridge also orders zones for output.
You can perform these activities related to zones: ◆ Mark text, table, and picture zones. ◆ Draw irregularly shaped zones. ◆ Have TextBridge automatically Find Zones. ◆ Edit automatic zoning. ◆ Erase a zone or part of a zone. ◆ Drag a selected zone to adjust its position. ◆ Display and edit the properties of a zone (such as language). ◆ Zone only part of a page. ◆ Delete zones so that text, tables, or pictures are not included in the final document using the Clear command.
Proofreading the Document In manual mode, after TextBridge recognizes each page, it stops for you to proofread the recognition results (Figure 4–10). TextBridge displays recognized pages in the Text view. The page is laid out like the original page. Pictures found by OCR are displayed in the same location as in the original page. Proofreading tools Original image View area Figure 4–10.
You can add corrected words to the user dictionary, which can improve recognition in subsequent pages of the same document and subsequent documents. The user dictionary is most useful for non-standard words that you frequently need to recognize, such as proper nouns and technical words. While you are still in proofreading mode, you can add pages to the final document by getting a page using either the automatic or manual process.
Figure 4–11. Saving the page using the Save As dialog box After you save the document, your document remains in TextBridge.
GETTING HELP W HILE USING TEXTBRIDGE TextBridge is designed to be easy to learn and use and also contains many user assistance options to guide you. The goal of user assistance is to provide you with information at the time you need it and to provide it primarily from within the program.
Using the Show Me How Window In the Welcome window, click Show Me How to display the Show Me How window (Figure 4–12). The Show Me How Window guides you through a specific task. It explains how to: ◆ Using the TextBridge tools ◆ Scan a document into your word processor ◆ OCR an existing image file such as a fax file or a TIF file ◆ OCR part of a page rather than the entire page Click on the activity that you want to learn about. An animated character describes the activity for you.
Note If you want to end the Assistant’s explanation early, right-click on him and select Hide. Using Tips Context-sensitive tips provide explanations, alternative activities, and related suggestions. They are embedded throughout the application and appear at the bottom of the screen or current dialog based upon the context within which you are working. You can click on Next Tip to loop through the tips.
Meaning of a word Click TextBridge Help in the Help menu, and in the Contents tab, click Glossary or use the Index. Entire dialog box Click the Help button in the dialog box. A menu command Select the menu item and click F1. You can get Help by using the main Help Topics window (Figure 4–13) and by performing one of the activities in the following list: ◆ Select a topic from a book in the Contents tab. ◆ Select a topic from the Index tab.
Using the TextBridge Web Site The TextBridge Web site provides the latest product information, an up-to-date scanner list, tips, and links to related Web sites. Select Visit TextBridge Web Site from the Help menu to see this information. WHERE TO GO FROM HERE Proceed to Chapter 5 of this booklet for step-by-step sample sessions showing how to using TextBridge.
5 SAMPLE SESSIONS WITH TEXTBRIDGE The previous chapters have introduced you to TextBridge and document recognition. This chapter provides step-by-step instructions to teach you how to use the most important capabilities of TextBridge. The learning sessions build on each other and assume that you understand the procedures explained in the previous sessions. It’s best to do them in order or skim through prior sessions to familiarize yourself with the steps.
USING THE S AMPLE D OCUMENTS In this section, you will learn about the sample documents and how to open a sample document. Use the sample documents provided with TextBridge for the learning sessions in this chapter. You can find the seven sample documents in the following location: C:\Program Files\TextBridge Pro 9.0\Image Files\Samples This is the default location for these files; however, you may have installed TextBridge in another location. The sample documents are: ◆ complex.xif ◆ dual page.
☞ In this session, you will learn to open a sample document. For this session, use letter.tif (Figure 5–1). Figure 5–1. Letter sample document To find and open a sample document: 1. Select the page source. Click the drop down arrow on the Get Pages button to select Image File. 2. Select the page type. Click the Page Type button to select Any Page (b&w), which will handle most black and white pages (Figure 5–2, next page).
Click the page type button Select a page type Figure 5–2. Select Page Type 3. Click the Get Pages button. The Get Pages dialog box appears. The default folder Samples is open. The sample files are listed in the Get Pages dialog box (Figure 5–3).
Select an image file Figure 5–3. Get Pages dialog box with letter.tif selected If Samples is not the open folder, access the sample documents folder in the following location from the Look In: box in the Get Pages dialog box: C:\Program Files\TextBridge Pro 9.0\Image Files\Samples This is the default location unless you installed TextBridge in another directory. 4. In the Get Pages dialog box, double-click a file name to open it. In this case, double-click letter.tif.
Figure 5–4. TextBridge - Image view For this lesson, you just want to go back to where you started without recognizing the document. 5. Click the New command in the File menu to discard the current page. A dialog box appears and tells you that the current page has not been saved. 6. Click OK to return to the original TextBridge screen. Now you know how to find and open a sample document. Proceed to the learning sessions to work with TextBridge and familiarize yourself with using its capabilities.
SESSION 1: RECOGNIZING A SIMPLE DOCUMENT USING AUTO PROCESSING TextBridge provides a range of powerful features. However, TextBridge is also designed to be very easy to use. For many documents, you can use default settings and automatically process a document. ☞ For this learning session, use the sample document named letter. This document has a single column of text and a logo.
To process a simple document, use the following procedure: 1. Start TextBridge. TextBridge appears. 2. Select the page source Click the drop down arrow on the Get Pages button to select Image File. 3. Select the page type. Click the Page Type button to select Any Page (b&w), Figure 5–5). Click the page type button Select Any Page (b&w) Figure 5–5.
4. Click the Auto process button. The Get Pages dialog box appears (Figure 5–6). Select an image file Figure 5–6. Get Pages dialog box with letter.tif selected 5. In the Get Pages dialog box, double-click the sample document, letter.tif. TextBridge reads the image file as shown in Figure 5–7 (next page).
Figure 5–7. TextBridge - Getting Page dialog box TextBridge then zones the page and identifies text, tables, and pictures as shown in the Zoning dialog box (Figure 5–8). Figure 5–8.
TextBridge automatically recognizes the characters and page layout as shown in the Recognizing dialog box (Figure 5–9). Figure 5–9. TextBridge - Recognizing dialog box After TextBridge reads the page image and processes it, it asks you to save the document (Figure 5–10). Accept the default name, or type a new name Click Save Select the output format Figure 5–10.
6. In the Save As dialog box, complete the following steps: • In the Save in list, select the folder in which to save the text file. ☞ Be sure to notice where the document is saved so that you can find it easily. • In the File name box, type a file name. • In the Save as type list, select the output format for your word processor or other text application. • Check that Retain pictures, Retain page layout, and Open file when done are selected. • Click the Save button.
Figure 5–11. Letter sample document With a word processor such as Word or WordPerfect in the page layout view, the recognized document should have the same or similar layout as the TIFF image or sample document. The difference is that now you have formatted, fully editable text, just as if you had typed it in yourself. At this point, you could spell check the document and make any other changes in your word processor. Close the word processing application. Notice that TextBridge is still running.
SESSION 2: USING INSTANT A CCESS TO TEXTBRIDGE You can use TextBridge Instant Access to run TextBridge from within another application, such as a word processor. To use Instant Access to TextBridge, simply start TextBridge from within an application, such as Word or WordPerfect. During Instant Access, TextBridge processes a document then pastes it into the open document in your text application. ☞ For this learning session, use the sample document named letter.
If TextBridge is still running from the previous learning session, exit from TextBridge. You can have more than one copy of TextBridge running at the same time, but it is not recommended. Before you run Instant Access to TextBridge, you may need to use the Instant Access Control Panel (Figure 5–12) to choose which applications have Instant Access. TextBridge automatically provides Instant Access for the applications listed in the control panel.
The Enable access to TextBridge list shows the text applications from which TextBridge can be invoked. The list includes applications commonly used with TextBridge and applications that are currently running. If your application does not appear in this list, close the TextBridge Instant Access Control Panel, start your application, and reopen the TextBridge Instant Access Control Panel. Your application should now appear in the list. Note: Only applications with a File menu will appear in the list.
Start Instant Access to TextBridge Figure 5–13. TextBridge... command in File menu The Instant Access dialog box appears (Figure 5–14). Notice that the Instant Access dialog box looks similar to the Page Type dialog box in the standalone version of TextBridge. Auto OCR and Manual buttons have been added, as well as choices for Page Source. 3. Click Auto OCR to start processing 1. Select Letter 2. Select Image File Figure 5–14.
3. In the Instant Access dialog box: • In the Page Type box, click Letter. Using Letter instead of the default Any Page (b&w) is a refinement of the settings. In using Letter, you are telling TextBridge that the page is single-column and the print is good enough for black and white scanning, which is faster. • In the Page Source box, select Image file. • In the Output box, select Retain pictures and Retain page layout. • Click Auto OCR. The Get Pages dialog box appears (Figure 5–15).
4. In the Get Pages dialog box, double-click the sample document, letter.tif. TextBridge reads the image file, and automatically performs OCR on it, as indicated by the progress dialog boxes. After acquiring and recognizing the page, TextBridge pastes the recognized document into the open document in your word processor. Compare the recognized document in your word processor with the reproduction of the sample document, Letter.tif, shown below (Figure 5–16). Figure 5–16.
SESSION 3: RECOGNIZING A COMPLEX DOCUMENT USING MANUAL PROCESSING For more complex documents such as magazine articles, you often can use TextBridge in automatic mode. However, simply using a few additional steps in manual mode can sometimes produce a more accurate result in less time. ☞ For this learning session, use the sample document named complex. This document has color pictures, multiple columns, a dropped capital letter, headings, paragraphs, a table, and reversed video text.
When you select Magazine (color) as the page type, it automatically specifies the following settings: ◆ Multi-column page layout ◆ Good print type ◆ Portrait orientation ◆ Color picture output For scanning, Magazine (color) page type specifies: ◆ Color scan ◆ Letter page size Run the standalone version of TextBridge from the Start button for this learning session. 1. Start the TextBridge standalone version. 2.
4. Click the Get Pages button. The Get Pages dialog box appears (Figure 5–17). Select complex.xif Figure 5–17. Get Pages dialog box with complex.xif selected 5. Double click complex.xif. TextBridge gets the page, and displays it in the Image view. The page you see should be a four-column magazine article beginning with a title and piechart. Notice that the piechart is already marked as a locked image. This is a segmented XIF file. ☞ If this is not the correct page, in the File menu, click New.
6. Click the Find Zones button. TextBridge automatically zones the page. TextBridge locates areas on the page to recognize and designates each area as text, table, or picture. TextBridge then stops for you to check and change the zones if necessary (Figure 5–18). Preview and zoning tools Page thumbnail Text zones Figure 5–18.
7. Check the results of automatic zoning. There should be text zones, a locked picture zone, and a table zone. • Click the Zoom In and Zoom Out buttons to enlarge and reduce the page to examine the zones, if necessary. Zoom In • Zoom Out Modify automatic zoning, if necessary. If a zone is not assigned the desired type, right-click the zone. In the shortcut menu, click Properties. Then, in the Properties dialog box, click the Type of zone you desire. Note You cannot change locked picture zones.
• Erase the area of the zone that connects the regular text to the reversed video text. Press and hold the left mouse button at the upper left corner of the area you want to erase. Drag the mouse diagonally across the area to erase. When you have defined the area, release the mouse button. The text highlight is erased, which means it is no longer included in a zone to be recognized. When the zones are accurate, continue with the next step, which is page recognition.
Proofreading tools Word Image window Suspect word Figure 5–19. Proofreading a page 9. Change any words that were not accurately recognized using the Proofreading tools. • Examine the word in the Suspect word box. If you want a closer look at the word as it appears in the original page, look in the Word Image window, or display the word image popup by moving the cursor over the highlighted word on the page. ☞ To view the entire original page image, click the Image tab.
• If the suspect word is not the word you want, type the word you want in the Suspect box. The Suspect box drop down contains alternative suggestions for the suspect word. Click on the suggestion to change to that word. • Click the Add to Dictionary button if you want the TextBridge dictionary to store a word for recognition of subsequent documents. ☞ The dictionary is most useful for non-standard words that you frequently need to recognize, such as proper nouns and technical words.
11. Save the page as Magazine. TextBridge provides a suggestion for the file name and uses the type of file you selected last, automatically appending the appropriate extension. Rich Text Format (RTF) supports recomposition and is compatible with most word processing applications. If you prefer another name, enter the new name in the File name box. Make any other changes, then click the Save button. TextBridge formats the document and saves the file.
The page is like the original page, including the original layout. The document is a fully editable version of Complex in your word processor. Note If retain layout is not selected, or if your text application does not support retain page layout, the page will be a single column of text, also referred to as galley text, followed by pictures. 13. Edit the document in your word processor. You can make changes to the text and layout in your word processor and spell check the page.
☞ For this example, use the sample document named scanning.tif. This document has a title heading, text with headings, a greyscale graphic, line art, reverse video, and a multiple-column cell table. In this session you’ll learn to: ◆ Compare page types to decide which to select. ◆ Modify a page type. ◆ Zone a page with text, pictures, and a table. ◆ Change the proofreading confidence level. ◆ Save a page.
To process text, pictures, and a table: 1. Select the page source. Click the drop down arrow on the Get Pages button to select Image File. 2. Select the page type. Click the Page Type button to select Table. ☞ You may need to scroll to see the icon for the Table page type. 3. In the Page Type dialog box, compare the descriptions of Table and Magazine (b&w). Click each page type and read the description in the information area near the bottom of the dialog box.
Figure 5–21. Original Page tab in the Page Type Settings dialog box with Table and Multi-column selected 5. In the Page Layout area of the Original Page tab in the Page Type Settings dialog box, select Multi-column. The settings are now set to multi-column instead of table text plus the original settings of any orientation and good print type. You can look through the other sections of the Page Type Settings dialog box to see the various settings by clicking on each tab, if you desire. 6.
7. Click OK to close the Page Type dialog box. 8. Click the Get Pages button. The Get Pages dialog box appears. 9. Double click scanning.tif in the Get Pages dialog box. TextBridge gets the page, and displays it in the Image view where you can preview it. The page you see should be titled “Scanning Industry is Booming.” ☞ When working with your own files, use the Shift and Control keys to select multiple image files for processing 10. Click the Find Zones button.
Find zones Manual zoning tools Highlighted zones Figure 5–22. Page with text, picture, and table zones 11. Check the results of automatic zoning. There should be two picture zones, several text zones, and one table zone. Check that the entire table is included in one table zone. Make sure that the title of the table and the reverse video text are each in separate text zones. Zoom in to verify the zoning, if necessary.
• If you need to resize a zone: Draw more with the zoning tools, or erase parts of the zones with the erase tool. Use the erase tool to separate the page title from the first paragraph. 12. Click Recognize. TextBridge recognizes the page, then stops for you to proofread the text (Figure 5–23). Confidence level Proofreading tools Figure 5–23.
13. Change the recognition confidence level. The default confidence level is Show Suspect Words. If you change the confidence level to Show Highly Suspect Words, TextBridge will raise its confidence level, and fewer words will appear as suspects. If you change the confidence level to Show Somewhat Suspect words, TextBridge will lower its confidence level, and more words will appear as suspects. To change the confidence level: • Click the Suspect drop down arrow and select from the menu. 14.
16. Save the page as Scanning.rtf. Be sure to select Retain pictures and Retain layout. TextBridge formats and saves the document. 17. Open Scanning.rtf in your word processor. Figure 5–24. Scanning sample document The page is like the original page with the original layout including the pictures and table. The document is fully editable in your word processor. 18. Edit the document in your word processor. You can spell check the page and make other changes.
19. Reset the Table page type in TextBridge. • Click the Page Type button. • Select Table. • Click the Settings button. • Click the Reset button. The original settings for the Table page type will be restored. WHERE TO GO FROM HERE The learning sessions in this chapter were designed to give you a solid basis on which to use TextBridge for your own documents. Additional learning sessions for more advanced topics are available in PDF format in Chapter 6, “Advanced Sample Sessions.
6 ADVANCED SAMPLE SESSIONS Previous chapters have introduced you to basic TextBridge capabilities. This chapter provides sample sessions with step-by-step instructions for using several more advanced TextBridge functions. The topics presented in this chapter are in the following list: ◆ Processing a document to use in a database ◆ Using zone templates and page types ◆ Training TextBridge OCR ☞ This chapter uses the same sample documents described in Chapter 5.
☞ For this learning session, use the image file named table.bmp. This image file has a heading followed by a table in cell format with gridlines containing dates, names, and telephone numbers. To process this document for use in a database: 1. Select the page type. • Click the Page Type button to select Table. • Click OK The settings are automatically changed to table page layout, good print type, any page orientation. 2. Click the Get Pages button. The Get Pages dialog box appears (Figure 6–1).
3. In the Get Pages dialog box, double click table.bmp. After TextBridge reads and processes the page image, it displays the page in the image view. 4. Click the Find Zones button. TextBridge automatically finds the zones on the page. Notice that the table is zoned with lines marking the cell borders (Figure 6–2). Click the Select button Table zoned with cell borders Figure 6–2. Zoned table.
5. Click the Select button on the toolbar and double-click on the table. The table editing tools replace the zoning tools (Figure 6–3). Draw hidden table border Merge table cells Draw visible cell table border Erase table cell border Figure 6–3. Table editing tools You can use these tools to correct any errors in the recognition of the cell borders. 6. Click on the page outside the table. The zoning tools replace the table editing tools in the toolbar. 7. Click the Recognize button.
Figure 6–4. Table in Text view 8. Click the Save As button. The Save As dialog box appears (Figure 6–5, next page).
Accept the default name, or type a new name Click Save Select Text tab-delimited output format Deselect Open file when done Figure 6–5. Save As dialog box 9. Save the document in tab-delimited format. • In the Save As dialog box, TextBridge provides a suggestion for the file name. If you prefer another name, enter the new name in the File name box. • Select Text tab-delimited (*.txt).
SESSION 2: USING ZONE TEMPLATES AND PAGE TYPES TextBridge provides zone templates as the means to repeatedly process or ignore specific areas on the same type of pages, and save time without rezoning each page. After you create a set of zones, TextBridge lets you save the current set of zones (including their size, location, and type) as a zone template. You can then use the zone template on other documents by specifying it within a page type, or you can load the zone template directly.
2. Click Get Pages. The Get Pages dialog box appears. 3. Double click Scanning.tif in the Get Pages dialog box. TextBridge gets the page, and displays it in the Image view where you can create a zone template. The page you see should be titled “Scanning Industry is Booming.” 4. Click Find Zones. TextBridge automatically zones the page then stops for you to check and change the zones (Figure 6–6). You can now adjust the zoning using the zoning tools.
Figure 6–6. Page with text, picture, and table zones 5. Save the zone template. • In the Tools menu, select Save Zone Template. The Save Zone Templates dialog box appears (Figure 6–7, next page).
Specify the default location Specify the file name Save the template Figure 6–7. Save Zone Template dialog box • Select the default location to save the zone template file. ☞ To specify your zone template in Page Type settings, you must save the template in the default folder, Zone Templates. However, if you save the zone template to another location, you can still load it using the Load Zone template command available from the Tools menu. • Specify the zone template file name, “My Newsletter.
Click to create a new page type Figure 6–8. Page Type Settings–Magazine (b&w) dialog box 7. Create a new page type. • In the Page Type Settings dialog box, click New to open the New Page Type dialog box (Figure 6–9). Type the new name Enter a description Figure 6–9.
• Type a description for your page type. • Click OK to close the New Page Type dialog box and return to the Page Type Settings dialog box (Figure 6–10). Zone template selected Figure 6–10. Page Type Settings with zone template selected 8. Select the zone template in the Settings dialog box. 6–12 • Click As zoned by template. • Select My Newsletter in the drop down list. • Click OK to close the Page Type Settings dialog box. • Click OK again to close the Page Type dialog box.
9. Begin a new document. You are now ready to process the next month’s Scanning News with your page type and zone template. • Select the New command from the File menu. TextBridge warns you that you have not saved the current pages. • Click OK. 10. Select the new page type. • Click the Page Type button. • Select My Newsletter in the Page Type dialog box. • Click OK. 11. Click the Auto button. The Get Pages dialog box appears. 12. In the Get Pages dialog box, double click Scanning.tif.
SESSION 3: TRAINING TEXTBRIDGE OCR To assure the highest possible accuracy, TextBridge provides an interactive training capability. This feature enables you to participate in the OCR process and train TextBridge by verifying correctly recognized words and correcting recognition errors. With training, TextBridge achieves higher accuracy for this specific page and any other pages like it. Interactive training is especially effective for degraded documents, such as faxes and multi-generation photocopies.
2. Enable training. Click the drop down arrow on the Recognize button and select Enable Training (Figure 6–11). Click the Recognize drop down arrow Select Enable Training Figure 6–11. Enable training ☞ This “sticky” setting remains in place for all subsequent documents until you disable training. 3. Click the Auto button. The Get Pages dialog box appears (Figure 6–12). Select fax.pcx Figure 6–12.
4. In the Get Pages dialog box, double click fax.pcx. TextBridge opens the page and begins recognition. When TextBridge is unsure of a word, it stops to enable you to train OCR. The Training dialog box appears (Figure 6–13). Click when the word is correct Click when you are done training Suspect word Word image Click if the recognized word image is not a word Click to undo last action Figure 6–13. Training OCR using Training dialog box 5. Change any words that were not accurately recognized.
☞ Sometimes TextBridge recognizes stray marks, handwritten notes, or dirt on the original page as characters. If the word image is not a word, click Not a Word. TextBridge continues on to the next word. ☞ To undo your last action, click the Undo button. For purposes of this session, repeat this process until you have trained OCR on at least a few words. Note Do not use training to correct any spelling errors in the original document or to change the original text in any way. 6.
7. In the Save Training Data dialog box: • Save training data in the Training Data folder. • Enter a file name. • Check Open file when done. • Save the file with a .trn extension. • Click the Save button. The Save Training Data dialog box closes, and the Save As dialog box opens (Figure 6–15). Figure 6–15. Save As dialog box 8. Save the page as fax.rtf. TextBridge saves the document to the selected format and opens your word processing application.
9. View the file in your word processor. Figure 6–16. Fax sample document Notice that, even though the input document was a low-quality fax image, TextBridge recognized it with a high degree of character recognition and formatting accuracy. You can use the saved training data to improve the recognition of documents of similar quality and with the same fonts. ☞ Using training on dissimilar documents may actually degrade recognition.
WHERE TO GO FROM HERE The learning sessions in this chapter were designed to give you a solid basis on which to use TextBridge for your own documents. For more information about TextBridge, please refer to the Help.
INDEX A Accept button, 6–16 Accepting a suspect word, 5–26 Adding a word to the dictionary, 5–27 Adobe Acrobat Reader, viii Any Page page types, 3–2, 5–8 Application formats supported, 1–6 Applications supporting recomposition, 1–6 Assistant, 1–5, 4–21 Automatic processing, 4–5, 5–8 Automatic zoning, 5–33, 6–8 Autorun program, 2–9 B Basic operations, 4–9 Before starting to OCR, 4–2 Built-in proofreader, 1–7 C Cell tables, 5–29, 6–3 Changing confidence level, 5–36 Closing the current document, 5–22 Colo
D Database documents, 5–1 Deferred processing, 1–8 De-installing TextBridge, 2–7, 2–15 Dialog boxes Get Pages, 5–4 Getting Page, 5–6 Instant Access control panel, 5–15 Instant Access to TextBridge, 5–17 New Page Type, 6–11 Page Type Settings, 5–31, 6–11 Page Type, 5–4 Recognizing, 5–11 Save As, 5–12 Save Training Data, 6–17 Save Zone Template, 6–10 Training, 6–16 Zoning, 5–10 Dictionary, 5–27 Displaying zone numbers, 4–15 Document Language, 3–16, 3–17 Document recomposition, 1–6 features and limitations, 3
F Fax documents, 1–10 Fax page type, 3–2 Find Zones button, 5–33 Formats supported, 1–6 Formatting with paragraph styles, 3–5 Forms, 4–14 G Get Pages button, 5–4 Get Pages dialog box, 5–4 Getting Page dialog box, 5–6 Grayscale images, 1–4, 1–11, 3–4 Grid lines, 5–29, 6–3 H Help system, x, 4–20 HTML output, 1–8 I Image documents, 3–2 procedure for opening, 5–3 Image file formats supported, 1–11, 1–13 Image processing, 1–8 Image view, 5–7 Improving page recognition, 3–8 Installing TextBridge, 2–1 langu
L Language and Zones, Tables, and Cells, 3–17 Language installation, 1–2, 2–5, 3–15 Language recognition, 1–4, 1–10, , 3–15 Learning sessions, 5–1, 6–1 Legal page type, 3–2 Letter page type settings, 3–2, 5–14 M Magazine (b&w) and (color) page type, 3–2, 5–21 Manual processing, 4–8, 5–20 Manual zoning, 1–9 Memory requirements, 2–5 Microprocessor needed to run TextBridge Pro, 2–5 Microsoft Word (RTF) , 1–13 Modifying page types, 5–31 Monitors, 2–5 N New Page Type dialog box, 6–11 Not a Word button, 6–17
P Page image data, 1–2, 2–2 Page image file formats supported, 1–11 Page image processing, 1–8, 3–4, 5–2 Page layout, 5–32 Page recognition, 3–1, 3–8 Page thumbnails, 4–3 Page type settings, 1–7, 2–8, 3–2 dialog box, 5–4, 5–31, 6–11 original page tab, 3–9 processing tab, 3–11 scanner tab, 3–10 Page types, 1–7, 2–2, 3–2, 4–10 creating, 6–11 modifying, 5–31 Pagis Pro and TextBridge Pro, 2–6 Picture zone, 4–14 Previewing the page, 4–12, 5–24 Processing a complex document, 5–20 Processing settings, 3–11 Proces
R ReadMe, 1–13 Recognition process, 3–1 Recognize button, 5–25 Recognizing dialog box, 5–11 Recomposition, 1–6, 2–4 limits of, 3–5 text program support for, 1–3 Registration card, 1–2 Release Notes, x, 1–13 Requirements, 2–5 Resizing a zone, 5–35 Retain page layout, 1–6, 3–4, 5–12 Retain pictures, 3–5, 5–13 Reverse video, 5–24 RTF, 1–13 Running multiple copies of TextBridge, 5–15 S Sample documents, 5–2 folder, 5–5 procedure for opening, 5–3 Save As dialog box, 3–12, 4–7, 5–12 Save Training Data dialog b
Selecting page source, 4–10 Serial number, xi Setup program, 2–9 Show Me How window, 4–21 Software registration card, 1–2 Software serial number, xi Software version number, xi Spreadsheet recomposition, 1–6 Starting a new document, 6–13 Starting TextBridge, 4–3 Suspect word box, 5–26 System requirements, 2–5 T Tab tables, 5–29 Tab-delimited format, 6–6 Table page type, 3–2 Table, 4–14 cell borders, 6–3 editing, 6–4 zone, 4–14 Technical Support, xi Text document settings, 3–2, 3–12 Text file formats suppo
TextBridge (cont.
TextBridge (cont.
W Ways You Can Use TextBridge, 4–2 Web site, 4–23 Welcome window, 4–20 What’s This? Help, 1–6 Windows, 2–5 Word Image window, 5–26 X Xerox PARC, 1–10 Z Zone order, 4–15 Zone templates, 1–9, 6–7 files from older versions of TextBridge, 2–7 saving, 6–9 Zones, 1–9, 4–14, 5–23 automatic, 1–7, 5–10, 5–23, 6–8 changing type, 5–24 dialog box, 5–10 dividing, 5–24 erasing, 5–24 locating, 5–23 pictures, 5–10 resizing a zone, 5–35 reverse video text, 5–34 tables, 5–10, 5–29, 5–34 text, 5–10, 5–23 Zoom In and Zoom