COPYRIGHT INFORMATION Copyright © 1995–2000 by ScanSoft, Inc. All rights reserved. No part of this publication may be transmitted, transcribed, reproduced, stored in any retrieval system or translated into any language or computer language in any form or by any means, mechanical, electronic, magnetic, optical, chemical, manual, or otherwise, without the prior written consent of ScanSoft, Inc., 9 Centennial Drive, Peabody, Massachusetts 01960.
CONTENTS PREFACE About This User’s Guide ............................... vii Organization of this user’s guide..................... viii Documentation conventions ......................... ix Related Documentation ................................ ix Technical Support ..................................... x 1 INTRODUCTION TO TEXTBRIDGE Basic OCR Concepts ................................. 1–1 Features and Benefits ................................ 1–3 New Features ...................................
2 INSTALLING AND SETTING UP TEXTBRIDGE What Comes with TextBridge .......................... 2–1 Supported Scanners.................................. 2–2 Installing and Testing Your Scanner ..................... 2–3 System Requirements ................................ 2–4 Before Installing TextBridge ........................... 2–4 Uninstalling a Previous Version of TextBridge .......... 2–4 Using TextBridge with Pagis ....................... 2–6 Learning about TextBridge before you install it .........
4 LEARNING TO USE TEXTBRIDGE Before Beginning to Process a Document.................. 4–1 Using TextBridge to Process a Document ................. 4–2 Starting TextBridge.................................. 4–3 Using Automatic Processing ........................... 4–4 Using Manual Processing ............................. 4–6 Performing Basic Operations........................... 4–8 Selecting the Page Source.......................... 4–8 Selecting the Page Type ...........................
ADVANCED SAMPLE SESSIONS Session 1: Processing a Document to Use in a Database ...... 6–1 Session 2: Using Zone Templates and Page Types ........... 6–6 Session 3: Training TextBridge OCR .................... 6–11 Where to Go From Here..............................
PREFACE ScanSoft, Inc. welcomes you to TextBridge Pro Millennium Business Edition for Windows® 95, 98, 2000, and Windows NT® 4.0. The documentation that comes with TextBridge provides you with the information you need to operate TextBridge. The documentation includes this user’s guide, a Help system, and Release Notes. ScanSoft invites your comments about the information provided in the documentation.
To view the user’s guide in PDF format you need Adobe Acrobat Reader, which is installed with TextBridge, unless you already have it on your PC. You can access the user’s guide from the installation menu and the TextBridge Help menu, or you can open it from Adobe Acrobat Reader. After you open it, you can view it on your PC and print all or part of it using Adobe Acrobat Reader.
Documentation conventions TextBridge documentation uses certain graphical elements and formatting to emphasize information and give more meaning to text. Table 1: Documentation Conventions bold Introduces a new term or the first use of an important term in a chapter. It is sometimes used to denote strong emphasis.
Refer to the documentation in the following list for information: ◆ Online Release Notes. Before or after you install TextBridge, read the Release Notes. These provide the most up-to-date information about TextBridge. They describe technical information, including specifics about using a particular scanner. Release Notes also include information unavailable at the time that the user’s guide and Help were finalized. During installation you can access the Release Notes from the installation menu.
The ScanSoft Web site provides a link to TextBridge pages, including Technical Support with Frequently Asked Questions, technical information bulletins, and a problem report form. Before sending a problem report form to ScanSoft Technical Support, be sure to visit FastTrack, ScanSoft’s electronic support system on the web site.
INTRODUCTION TO TEXTBRIDGE 1 Welcome to ScanSoft’s TextBridge Pro Millennium, optical character recognition (OCR) software for Microsoft Windows® 95, 98, 2000 and Windows NT® 4.0.
Using the latest document recognition technology from ScanSoft, TextBridge OCR uses its recomposition capability to produce a fully editable electronic document with the original pictures and document layout (Figure 1–1). Original document Recomposed document in word processor Figure 1–1. TextBridge document recomposition In most cases, TextBridge understands your original document’s format and maintains the layout, including columns, headers, footers, pictures, and picture captions.
Recomposition is possible only if your text program supports pictures and layout. For example, recomposition is supported in Microsoft Word and Corel WordPerfect but not in Notepad. Forms and documents created in desktop publishing programs are usually too complex for recomposition by TextBridge as well as your word processor. As a result, the text and pictures are retained but the full layout is not.
Enhanced Features In addition to the new features, TextBridge offers enhanced features that were available in previous versions. These features were available before and are even better now. They are described in the following list: 1–4 ◆ Instant Access. Start TextBridge within most Windows text programs such as Word or Excel. After recognizing and converting the page, TextBridge then automatically pastes recognition data (text and pictures) directly into the program’s open document. ◆ OCR accuracy.
◆ Convenient batch processing. The ability to select multiple files and process each file separately plus the ability to schedule processing for a specific time in the future. ◆ Integration with e-mail programs. Input to popular programs such as Lotus cc:Mail, Microsoft Outlook, and America Online (AOL). ◆ Integration with the latest scanners. TextBridge works with the most recent scanners. The Release Notes and the ScanSoft Web site at www.scansoft.
TextBridge supports formats for the programs that retain page layout in the following list: ◆ Internet Explorer ◆ Netscape ◆ Word 6.0, 7.0, 97, and 2000 ◆ Word Perfect 6.0, 6.1, 7.0, 8.0, and 9.0 ◆ Any word processor that supports RTF Retaining pictures is independent of retaining layout. Some text programs retain pictures even though they do not retain layout. 1–6 ◆ Page Types. TextBridge provides many predesigned Page Types to make processing easier and more efficient.
◆ Output files to the latest version of programs. These include Microsoft Word 2000, Excel 2000, FrontPage 2000, WordPerfect 9.0, and Adobe FrameMaker 5.0. ◆ Custom dictionaries. To improve recognition accuracy further, you can create specialized word lists (scientific terminology, proper names, acronyms, and so on) within TextBridge or in ASCII text files and load them into TextBridge. You can also use your Microsoft Word or Office custom dictionary with TextBridge.
◆ Zone templates. After you create a set of zones, TextBridge lets you save and reload zone templates for new jobs. In this way you can consistently process or ignore specific areas on the same type of pages and save time without rezoning each page. ◆ Re-usable training data. After you interactively train OCR, you can save the training data in a file. You can reload this training file for similar documents of the same page type.
◆ Hard-copy faxes ◆ Documents with point sizes ranging from 5-point to 72-point type in practically any typeface ◆ Documents composed in any of many Eastern, Central, or Western European languages as well as one or more of the languages within one of these groups in the same document INPUT IMAGE FILE FORMATS The source of page images for TextBridge can be your scanner or it can be image files.
OUTPUT TEXT FILE FORMATS TextBridge can convert its recognized text and pictures to files for the following programs and formats: Programs and Formats Adobe Acrobat Portable Document Format (PDF) Ami Pro 2.0 and 3.0 dBase IV DisplayWrite 5 Excel 97 and 2000 Excel 3.0, 4.0, and 5.0 Excel for the Macintosh 3.0 to 7.0 FrameMaker HTML WYSIWYG HTML Interleaf Lotus 1-2-3 Lotus Word Pro MultiMate Advantage II PostScript Professional Write 2.0 and 2.
PDF files can be transferred and shared across computer platforms. Originally developed by Adobe Systems, Inc, PDF files can be viewed with the Adobe Acrobat Reader. The following table lists the PDF file types you can create with TextBridge as well as the equivalent Adobe names: TextBridge Adobe Acrobat 4.0 Adobe Capture 3.
Note ☞ Note PDF formats are available for languages in the American/European language group only. Refer to “Recognizing Other Languages” in Chapter 3, “OCR and Basic TextBridge Operations” for more information about language groups. Microsoft Word (RTF) format is also accepted by a number of other applications, including ClarisWorks® and Adobe® PageMaker®, and WordPad. See the documentation for your particular application for more information about importing files in RTF format.
WHERE TO GO FROM HERE To learn how to install and set up TextBridge on your system, go to Chapter 2. To learn how TextBridge recognizes a document and how you prepare TextBridge to do this, read Chapter 3. This chapter explains the basic concepts and functions of the software. To learn how you use TextBridge to process simple and complex documents, refer to Chapter 4. It also explains how to view, zone, train, and proofread your document in TextBridge and edit your document in your word processor.
INSTALLING AND SETTING UP TEXTBRIDGE 2 This chapter describes the TextBridge software installation and setup procedures.
Note Be sure to register electronically or print and return the printed software registration form. Registration qualifies you for technical support and assures that you are kept up-to-date on new software releases and other information related to TextBridge and the ScanSoft family of products. SUPPORTED SCANNERS TextBridge works with many popular desktop scanners using your scanner's TWAIN interface.
After installing your scanner, test that the scanner is functioning. Refer to the scanner manufacturer’s documentation to answer any questions about the scanner. Note Your scanner must be working independently of TextBridge prior to connecting it to TextBridge. In general, we recommend that you turn on your scanner before you turn on your PC. Next, install and test your scanner. INSTALLING AND TESTING YOUR SCANNER Refer the to manufacture's detailed instructions for installing your scanner.
SYSTEM REQUIREMENTS To install and run TextBridge, your Windows-compatible PC must be equipped with the following: ◆ An Intel (or compatible) 80486 or Pentium™ microprocessor. We recommend Pentium for the best performance. ◆ A VGA, SVGA, or multi-sync color monitor. ◆ A minimum of 24 megabytes (MB) of random access memory (RAM) for Windows 95 and 98; a minimum of 32 MB for Windows 2000 or Windows NT. We recommend 64 MB for the best performance. ◆ Microsoft Windows 95, 98, or 2000 or Windows NT 4.0.
When you insert the TextBridge CD-ROM into your CD-ROM drive, if there is an older version of TextBridge installed, a dialog box appears and recommends that you uninstall that older version. To save disk space, you can uninstall any of these older versions of TextBridge; however, you are not required to do so. If you choose not to do this before installing the new version of TextBridge, you can uninstall it at a later time. To uninstall a previous version of TextBridge, use the following procedure: 1.
Training data created with TextBridge 9.0 can be used with TextBridge Pro Millennium Business Edition. Just move them to the Windows folder ...All Users\Application Data\TextBridge\Bin\Training Data Training data and zone templates created with versions of TextBridge earlier than TextBridge 9.0 cannot be used with this version of TextBridge and can be deleted. You can delete the entire TextBridge folder after you have moved any files that you want to keep.
◆ View Online Documentation. If Adobe Acrobat Reader is not already installed on your PC, TextBridge starts Acrobat’s installation program. The complete online user’s guide appears for you to read and review. ◆ Browse the CD. Windows Explorer opens the TextBridge CD for you to view the folders and files that come with the TextBridge installation program. ◆ Visit ScanSoft’s Web site.
2. Click Install TextBridge Pro Millennium Business Edition Follow the onscreen prompts and instructions to install TextBridge Pro Millennium Business Edition. Congratulations! TextBridge setup is now complete, and your new software is installed on your PC. Note Updates to your TextBridge software may be available on the ScanSoft web site. Refer to "Updating your TextBridge Software" later in this chapter for more information.
SETTING UP INSTANT ACCESS TO TEXTBRIDGE Instant Access enables you to use TextBridge directly from a number of other programs, such as Word. With Instant Access you can select TextBridge from the File menu of another program. TextBridge starts, recognizes your pages, and then pastes the results at the cursor in the open document. TextBridge automatically includes Instant Access to many of the applications on your PC.
UPDATING YOUR TEXTBRIDGE SOFTWARE You can get live updates to TextBridge from the ScanSoft Web site. These updates can include new scanner support, software patches, and other updates. To update TextBridge: 1. In the TextBridge Help menu, select ScanSoft on the Web and click TextBridge Updates. If your computer is set up for Internet access, your Web browser opens at the ScanSoft Web site. 2. Check for updates to your TextBridge software. 3.
7. The Uninstall Complete dialog box appears. Click OK to restart your computer. When you complete these steps, TextBridge is uninstalled from your PC. ☞ If you have created any files in the TextBridge folder, your files and the TextBridge folder are not deleted by the uninstall process. You can delete the entire TextBridge folder and its contents after you have moved any files that you want to keep to another location.
OCR AND BASIC TEXTBRIDGE OPERATIONS 3 This chapter provides information about the process of page recognition. Use this chapter to learn about optical character recognition (OCR), page recognition, recomposition, and operations that will help you use TextBridge effectively including automatic and manual processing and page types and settings for recognition.
Page types TextBridge can recognize a wide variety of pages. All you need to do is select the page type that most closely matches your original page. TextBridge gives you common page types with settings that are used most often to process pages of that type. You can also define your own page types to handle processing of other types of pages. Using page types makes it quick and easy for you to perform page recognition. You can modify these page types or create new page types and save them for future use.
Figure 3–1. Original Page tab in Page Type Settings dialog box The page type also specifies Scanner Settings controlling how pages of this type will be scanned. The scan page size is set according to the Size setting. Your scanner’s capabilities, together with the Print Type and Picture Output settings determine the scan resolution and whether scanning is color, grayscale, or black and white.
Recomposition TextBridge recomposition lets you keep the layout of the original page. When you select Retain page layout in the Save As dialog box, TextBridge recomposes the layout, while maintaining full ability to edit in the output file (except WYSIWYG HTML). After recomposition, text, pictures, and tables are in the same position in relation to each other as in the original page.
For some documents, you may want only the text in simple galley (one-column) form. In this case, you would not want to retain the layout. The output document will have a single column of all the text in the original document. If you choose to format with paragraph styles, the text formatting but not the page layout will be retained. For example, the final document will have paragraphs and headings in styles like the original document and in the order of the original document.
Instant Access Instant Access runs more automatically than TextBridge standalone with a minimal, dialog box-based user interface. You process the entire document with little intervention. Instant Access gives you direct access to TextBridge from programs such as Word and WordPerfect. Programs with Instant Access have a TextBridge command in the File menu. Clicking TextBridge in the File menu starts TextBridge, which recognizes pages and pastes them directly into the open document in the program.
IMPROVING PAGE RECOGNITION WITH SETTINGS There are a number of settings that you select in TextBridge at the beginning of the recognition process to help it recognize a document with more accuracy. Many of these options are related to the manual processes described in the previous section. Use the Page Type Settings, Save As, and Options dialog boxes to specify which options of the software you want to use. Usually, you will want to use the settings automatically assigned to a page type.
Original Page Settings On the Original Page tab, you can choose the following settings: ◆ Set the page orientation for the way text and images are printed on the original page: • Any orientation • Portrait • Landscape If you select Any orientation, TextBridge automatically determines the page orientation. Use this setting if you don’t know the orientation of your pages or have pages with different orientations. Use Portrait or Landscape for faster processing.
◆ Set the print type of the document to be processed. • • • • • Any print type Good Fax Dot matrix Newspaper When you select Any print type, TextBridge automatically determines the print type. Scanner Settings You can view and change the settings for your scanner in the Scanner tab of the Page Type Settings dialog box (Figure 3–3). Figure 3–3.
◆ Picture Output: • • • Black and White Gray Color ◆ TextBridge determines the best scan resolution and color for the Original Page and Picture Output settings. Click Custom if you want to override this default scan resolution setting. ◆ Set the scan page size to reflect the actual size of the original page. ◆ To override the default Brightness setting, uncheck Adjust Automatically and move the slider.
On the Processing tab: ◆ Select the primary language of the document. If you select more than one language, they all must be in the same language group. You cannot change the language group after you begin processing a document. ◆ Select the user dictionary you want used when processing pages. You can add technical terms and proper names to a user dictionary during proofreading and training. The user dictionary assists TextBridge in recognizing words it does not know.
Except for the File name, these settings are “sticky” and do not change from document to document, unless you change them. When you save a document, you can change the settings to be sure they are the best ones for your document. ◆ Specify one or more recomposition settings to reflect the output results you want based on the original page: • Retain the layout of the original page. • Retain the pictures of the original page.
RECOGNIZING OTHER LANGUAGES TextBridge can recognize text in the the languages in the following list: Afrikaans Breton Croatian English Flemish Galician Hawaiian Italian Lithuanian Norwegian Romanian Slovenian Tahitian Welsh Albanian Bulgarian Czech Estonian French German Hungarian Kurdish Lower Sorbian Pigin English Russian Spanish Turkish West Frisian Aymara Byelorussian Danish Faroese Friulian Greek Icelandic Latin Macedonian Polish Serbo-Croatian Swahili Ukrainian Zulu Basque Catalan Dutch Finnish Ga
You could run TextBridge more than once with different language groups and zoning to recognize a document that contains languages in more than one language group. The following items describe methods for recognizing multiple languages in the same document: ◆ Document Language Group Before you begin to process any pages, you can change the Language Group using the Document Language Group drop down list in the Processing tab of the Page Type Settings dialog box.
◆ Language and Zones, Tables, and Cells TextBridge assumes that all text and table zones are in the languages that you have specified for the document. You can change the language of the selected zone, table, or table cells from the document language to any other language in the same language group. Right-click the zone and click Properties in the context menu. In the Properties dialog select the language for this zone using the language drop down list.
LEARNING TO USE TEXTBRIDGE 4 The previous chapters introduced you to TextBridge and document recognition. This chapter describes the most basic capabilities of TextBridge. You will become familiar with the basic functionality of TextBridge so that you can understand how TextBridge works. The following chapters take you from the beginning to the end of using TextBridge to process different kinds of documents.
5. Does the original document have pictures? If so, do I want to retain the pictures? 6. Do I want to capture the whole page or just part of it? 7. Are the pages I am processing all similar in layout and style? 8. Do I want to proofread the results before saving? 9. Are there any other settings I want to check and change? The rest of this chapter provides information that helps you to answer these questions.
STARTING TEXTBRIDGE There are two ways to start TextBridge. You can start TextBridge as a standalone application or you can start TextBridge through Instant Access from most Windows-based text applications or directly from Explorer. In this section you will learn to start TextBridge as a standalone application. To start TextBridge as a standalone application: 1. On the Windows task bar, click Start. 2. Point to Programs, then point to the TextBridge Pro Millennium BE folder. 3. Click TextBridge.
USING AUTOMATIC PROCESSING When you use TextBridge’s automatic processing feature, TextBridge processes pages with very little interaction with you. In automatic mode, after you select the page type and page source, TextBridge automatically recognizes your page(s). TextBridge only stops for you to add more pages and to save the results of recognition. TextBridge also allows you to automatically save and open the recognized documents in another application, such as a word processor or editor.
2. If TextBridge is getting a document from an image file, in the Get Pages dialog box, select the file to process. If TextBridge is getting a document from your scanner, after scanning the first page, you may do one or more of the following: Click the More Pages button in the Add More Pages to Scanner dialog box (Figure 4–3) to scan another page. or Click the Other Side button to scan the other side of two-sided pages. or Click the Done button when there are no more pages to add.
3. Save the text with any picture(s) in a file format of your choice (Figure 4–4). Figure 4–4. Save the document using the Save As dialog box USING MANUAL PROCESSING TextBridge enables you to get remarkably accurate results from page recognition. However, page recognition is a complex process, and with some documents it can require your interaction with TextBridge to get the best output.
Note Select the page source from the Get Pages drop down menu and the Page Type before beginning OCR. Refer to “Performing Basic Operations” in this chapter or TextBridge Help for more information about these steps. If scanning, insert your page in the scanner. 1. Click the Get Pages button. TextBridge scans your page or reads the selected image file(s). To get more pages, click Get Pages again. 2. View and zone the page images.
PERFORMING BASIC OPERATIONS When you OCR your document automatically or manually, certain basic operations allow you to refine the procedures. They are: ◆ Selecting the Page Source (with Get Pages) ◆ Selecting the Page Type ◆ Previewing the Page (manual only) ◆ Zoning the Page (manual only) ◆ Proofreading the Document ◆ Saving the Document Selecting the Page Source Before you start processing a new document, you can indicate whether pages are from your scanner or an image file.
Selecting the Page Type TextBridge provides Page Types for the following kinds of documents: Any Page (b&w) Any Page (color) Book (Dual page) Business Card Fax Table Letter Legal Magazine (b & w) Magazine (color) Newspaper For the best OCR results and performance, you can select the page type that best matches your original page(s). Page types make it to get the best settings for processing specific kinds of pages.
To view or change the settings for the selected page type, click the Settings button from the Page Type dialog box, (Figure 4–7). Figure 4–7. Change settings for this page type TextBridge provides page types for the most common types of pages. You can also define your own page types with settings optimized for other specialized types of documents. Previewing the Page When manually processing, TextBridge displays the image of each page in the Image view (Figure 4–8).
◆ Check the “scan” quality of the scanned page. ◆ Delete the page, adjust scanner settings, and rescan the page. ◆ Rotate the page to make the page upright. ◆ Delete the page from the document. ◆ Add more pages to the document. ◆ Cancel the process by creating a new file or opening another file. ◆ Look at the properties of the page. ◆ Continue processing the page. You can use the Image Tab toolbar or View and Page menu commands to examine and orient the acquired page.
Zoning the Page Before recognizing text on a page, TextBridge finds the text, table, and picture areas on the page (Figure 4–9). These areas are called zones. TextBridge does this automatically when processing in Automatic mode. In Manual mode, you can mark the zone yourself or click Find Zones to have TextBridge automatically zone the page. Find zones Manual zoning tools Highlighted zones Figure 4–9. Zone the page using the Zoning tools A zoned page is divided into one or more zones.
Each type of zone has a different transparent color so you can easily distinguish among them. TextBridge assigns the default colors to each type of zone: yellow for text, blue for pictures, and brown for tables. You can change the assigned colors in the Color Tab of the Options dialog box, available from the Tools menu. Only those parts of the page that are marked with zones are recognized by TextBridge. If you want to recognize only part of a page, mark only that portion.
◆ Zone only part of a page. ◆ Delete zones so that text, tables, or pictures are not included in the final document using the Clear command. ◆ Change a zone from one type, such as text, to another type, such as table. ◆ Zoom In or Out to enlarge or reduce the page view. You can also perform these less common activities related to zones: ◆ Find and edit the cell structure of a table. ◆ Use the same zoning for subsequent pages of the same document.
Proofreading tools Original image Recognized text ready for proofing Figure 4–10. Proofreading the page using the Proofreading tools Words that TextBridge suspects may not have been recognized correctly are color coded. Suspect words are identified by one color and unrecognized characters are highlighted in another color. By default, the suspect words are blue, and the current word in the Suspect box is yellow in the view. Use the Proofreading tools to correct words.
Saving the Document After you finish proofreading the document, you are ready to save it. You can specify the location, name, and format of the output document. TextBridge converts the document to the format of your choice and saves it. You can choose to save the pictures and retain the original page format in your output. Note Not all formats can retain pictures and page layout. Some formats preserve only part of the original page layout. Refer to the Help for more detailed information.
After you save the document, your document remains in TextBridge. You can then do any of the following: ◆ Save the document in another format. ◆ Add or delete pages. ◆ Change zoning. ◆ Recognize the document again. ◆ “Send To” a text application. ◆ Use the New command in the File menu to begin a new job. ◆ Close TextBridge. GETTING HELP WHILE USING TEXTBRIDGE TextBridge is designed to be easy to learn and use. It contains many user assistance options to guide you.
Using the Welcome Window When you start TextBridge for the first time, the Welcome window appears (Figure 4–12). This window describes the basic steps for using TextBridge. This window appears every time you start TextBridge until you uncheck Show this welcome when starting. Click the Show Me How button to learn more about using TextBridge or use the main toolbar to begin processing your document.
Click a topic to call the Assistant Figure 4–13. Show Me How window The Show Me How Window guides you through a specific task. It explains how to: ◆ Use the TextBridge tools ◆ Scan a document into your word processor ◆ OCR an existing image file such as a fax file or a TIF file ◆ OCR part of a page rather than the entire page Click on the activity that you want to learn about. An animated character describes the activity for you. ☞ Note You can also click Show Me How in the Help menu.
Using Tips Context-sensitive tips provide explanations, alternative activities, and related suggestions. They are embedded throughout the application and appear at the bottom of the screen or current dialog box based upon the context within which you are working. You can click on Next Tip to loop through the tips. You can choose not to display the tips in the main window from the Toolbars dialog box available from the View menu.
You can get Help by using the main Help Topics window (Figure 4–14) and by performing one of the activities in the following list: ◆ Select a topic from a book in the Contents tab. ◆ Select a topic from the Index tab. ◆ Search for information about a specific word or phrase using the Find tab. ◆ Jump from one topic to a related topic. Figure 4–14.
WHERE TO GO FROM HERE Proceed to Chapters 5 and 6 of this booklet for step-bystep sample sessions showing how to using TextBridge. Chapter 5 shows you how to use auto processing, Instant Access, recognize a document with complex layout, and process a document with text, pictures, and a table. Chapter 6 shows how to process a document for a database, to use advanced settings for zones and page types, and to train TextBridge’s OCR.
SAMPLE SESSIONS WITH TEXTBRIDGE 5 The previous chapters have introduced you to TextBridge and document recognition. This chapter provides step-by-step instructions to teach you how to use the most important capabilities of TextBridge. The learning sessions build on each other and assume that you understand the procedures explained in the previous sessions. It’s best to do them in order or skim through prior sessions to familiarize yourself with the steps.
You can find the seven sample documents in the following location: C:\Program Files\TextBridge Pro Millennium BE\Image Files\Samples This is the default location for these files; however, you may have installed TextBridge in another location. The sample documents are: ◆ complex.xif ◆ dual page.tif ◆ fax.pcx ◆ letter.tif ◆ multipage.xif ◆ scanning.tif ◆ table.bmp ☞ For this session, use letter.tif (Figure 5–1). Figure 5–1.
After you have started TextBridge, to find and open a sample document: 1. Select image file as the page source. Click the drop down arrow on the Get Pages button and select Image File. 2. Select the page type. Click the Page Type button and select Any Page (b&w), which will handle most black and white pages (Figure 5–2). Click the page type button Select a page type Figure 5–2.
3. Click the Get Pages button. The Get Pages dialog box appears and lists the sample files (Figure 5–3). Select an image file Figure 5–3. Get Pages dialog box with letter.tif selected Note The default folder for image files is C:\My Documents\TextBridge\Image Files However, unless you installed TextBridge in another directory, sample image files are installed in Location of sample image files C:\Program Files\TextBridge Pro Millennium BE\Image Files\Samples If Samples is not the open folder, access
Figure 5–4. TextBridge—Image view ☞ 5. For this lesson, you just want to go back to where you started without recognizing the document. This can be useful if you change your mind and want to start over without processing a document further. Click the New command in the File menu to discard the current page. A dialog box appears and tells you that the current page has not been saved. 6. Click OK to return to the original TextBridge screen. Now you know how to find and open a sample document.
SESSION 1: RECOGNIZING A SIMPLE DOCUMENT USING AUTO PROCESSING TextBridge provides a range of powerful features. However, TextBridge is also designed to be very easy to use. For many documents, you can use default settings and automatically process a document. ☞ For this learning session, use the sample document named letter.tif. This document has a single column of text and a logo.
2. Select the page source. Click the drop down arrow on the Get Pages button to select Image File. 3. Select the page type. Click the Page Type button and select Any Page (b&w), Figure 5–5). Click the page type button Select Any Page (b&w) Figure 5–5. Page Type dialog box with Any Page (b&w) selected 4. Click the Auto process button. The Get Pages dialog box appears (Figure 5–6, next page).
Select an image file Figure 5–6. Get Pages dialog box with letter.tif selected 5. In the Get Pages dialog box, double-click the sample document, letter.tif. TextBridge reads the image file as shown in Figure 5–7). Figure 5–7.
TextBridge then automatically zones the page and identifies text, tables, and pictures as shown in the Zoning dialog box (Figure 5–8). Figure 5–8. TextBridge - Zoning dialog box TextBridge automatically recognizes the characters and page layout as shown in the Recognizing dialog box (Figure 5–9). Figure 5–9.
After TextBridge reads the page image and processes it, it asks you to save the document (Figure 5–10). Accept the default name, or type a new name Click Save Select the output format Figure 5–10. Save As dialog box 6. In the Save As dialog box, complete the following steps: • In the Save in list, select the folder in which to save the text file. ☞ • • • • 5–10 Be sure to notice where the document is saved so that you can find it easily. In the File name box, type a file name.
With Retain page layout selected, TextBridge includes information describing the layout of the original page in the output file. When you open the file or print the page in an application that supports recomposition, such as Word, WordPerfect, or Excel, in most cases the page is recomposed like the original page. Text, tables, and pictures are in the same position in relation to each other as in the original document. With Retain pictures selected, TextBridge includes pictures in the final output document.
8. Close the word processing application. Notice that TextBridge is still running. The recognized page is displayed in the Text view. You can save the document in another format or recognize a new document. 9. For now, simply close TextBridge. SESSION 2: USING INSTANT ACCESS TO TEXTBRIDGE You can use TextBridge Instant Access to run TextBridge from within another application, such as a word processor.
If TextBridge is still running from the previous learning session, exit from TextBridge. You can have more than one copy of TextBridge running at the same time, but it is not recommended. Before you run Instant Access to TextBridge, you may need to use the Instant Access Control Panel (Figure 5–12) to choose which applications have Instant Access. TextBridge automatically provides Instant Access for the applications listed in the control panel.
Note Only applications with a File menu will appear in the list. Click on applications in the list to check or uncheck them. Click All to check all items in the list. Click None to uncheck all items in the list. Instant Access to TextBridge will be available from all checked applications. Note The Instant Access Control Panel may also list applications that are not compatible with Instant Access. Be sure to select only those applications that you intend to use.
3. Click Auto OCR to start processing 1. Select Letter 2. Select Image File Figure 5–14. TextBridge Instant Access dialog box 3. In the TextBridge Instant Access dialog box: • • • • • In the Page Type box, click Letter. Using Letter instead of the default Any Page (b&w) is a refinement of the settings. In using Letter, you are telling TextBridge that the page is single-column and the print is good enough for black and white scanning, which is faster. In the Page Source box, select Image file.
The Get Pages dialog box appears (Figure 5–15). Select an image file Figure 5–15. Get Pages dialog box with Letter.tif selected 4. In the Get Pages dialog box, double-click the sample document, letter.tif. TextBridge reads the image file, and automatically performs OCR on it, as indicated by the progress dialog boxes. After acquiring and recognizing the page, TextBridge pastes the recognized document into the open document in your word processor.
With a word processor such as Word or WordPerfect in the print or page layout view, the recognized document should have the same or similar layout as the TIFF image or sample document. The difference is that now you have formatted fully editable text. ☞ If this document continues to a second page, delete any additional spacing that was inserted into the document. You can save the document or make any changes you’d like to the document just as if you’d typed it yourself.
When you select Magazine (color) as the page type, it automatically specifies the following settings: ◆ Multi-column page layout ◆ Good print type ◆ Portrait orientation For scanning, Magazine (color) page type specifies: ◆ Letter page size ◆ Color picture output 1. Start the TextBridge standalone version from the Start button. 2. Select the page source Click the drop down arrow on the Get Pages button to select Image File. 3. Select the page type.
Select complex.xif Figure 5–17. Get Pages dialog box with complex.xif selected 5. Double click complex.xif. TextBridge gets the page, and displays it in the Image view. The page you see should be a four-column magazine article beginning with a title and pie chart. Notice that the pie chart is already marked as a locked image. This is a segmented XIF file. ☞ 6. If this is not the correct page, in the File menu, click New. Click OK to close the current document. You can begin again by selecting Get Page.
Preview and zoning tools Page thumbnail Text zones Figure 5–18. Zoned magazine page 7. Check the results of automatic zoning. There should be text zones, a locked picture zone, and a table zone. • Click the Zoom In and Zoom Out buttons to enlarge and reduce the page to examine the zones, if necessary. Zoom In • Zoom Out Modify automatic zoning, if necessary. If a zone is not assigned the desired type, right-click the zone. In the shortcut menu, click Properties.
☞ Reverse video text must be in a separate text zone that includes no regular text. If the reverse video text is not in one zone by itself, manually rezone the reverse video text. One way to separate the reverse video text from the regular text is to use the Erase Markup tool. Determine which area of the page you want to include in the reversed video text zone. To divide one zone into two zones: • • Click the Erase Markup button.
Proofreading tools Word Image window Suspect word Figure 5–19. Proofreading a page Suspect words, words TextBridge was not sure of, are displayed in blue text. The current suspect word is highlighted in yellow and displayed in the Suspect edit box. 9. Change any words that were not accurately recognized using the Proofreading tools. • Examine the word in the Suspect word box.
• If the suspect word is not the word you want, type the word you want in the Suspect box. The Suspect box drop down contains alternative suggestions for the suspect word. Click on the suggestion to change to that word. • Click the Add to Dictionary button if you want the TextBridge dictionary to store a word for recognition of subsequent documents. ☞ The dictionary is most useful for non-standard words that you frequently need to recognize, such as proper nouns and technical words.
• Check to be sure that Retain pictures and Open file when done are selected. ☞ • Selecting Retain layout has no effect on a PDF file. Retain pictures does control whether or not pictures are included. Make any other changes, then click the Save button. TextBridge formats the document and saves the file. If Open file when done is checked in the Save As dialog, Adobe Acrobat Reader will automatically start up and open Magazine. 12. View Magazine in your Adobe Acrobat Reader. Figure 5–20.
SESSION 4: PROCESSING TEXT, PICTURES, AND A TABLE Some complex documents include text and one or more pictures and tables. When this is the case, you may not be certain which page type to select. If the text is single column, select Table. If the text is multi-column, you can either modify the Table page type or select Magazine (b&w). TextBridge can produce two types of tables: cell tables and tab tables. A cell table is divided into rows and columns made of areas called cells.
Note Before you begin, be sure TextBridge is ready to start a new document. If necessary, select the New command from the File menu. To process text, pictures, and a table: 1. Select the page source. Click the drop down arrow on the Get Pages button to select Image File. 2. Select the page type. Click the Page Type button and select Table. ☞ 3. You may need to scroll to see the icon for the Table page type. In the Page Type dialog box, compare the descriptions of Table and Magazine (b&w).
Figure 5–21. Original Page tab in the Page Type Settings dialog box with Multi-column selected 6. Click OK to close the Page Type Settings dialog box. TextBridge modifies the settings for the Table page type based on your changes. These settings are retained until you change them. 7. Click OK to close the Page Type dialog box. 8. Click the Get Pages button. The Get Pages dialog box appears. 9. Double click scanning.tif in the Get Pages dialog box.
10. Click the Find Zones button. TextBridge automatically zones the page then stops for you to check and change the zones (Figure 5–22). Find zones Manual zoning tools Highlighted zones Figure 5–22. Page with text, picture, and table zones 11. Check the results of automatic zoning. There should be two picture zones, several text zones, and one table zone. Check that the entire table is included in one table zone.
If you want to change a zone, check or modify the properties of a zone: • • Right-click the zone and use the zone shortcut menu to zoom, clear, or delete zones, or see the zone’s properties. If you need to resize a zone: Draw more with the zoning tools, or erase parts of the zones with the erase tool. Use the erase tool to separate the page title from the first paragraph. 12. Click Recognize. TextBridge recognizes the page, then stops for you to proofread the text (Figure 5–23, next page). 13.
Confidence level Proofreading tools Figure 5–23. Confidence level in Proofread toolbar 14. Change any words that were not accurately recognized using the Proofreading tools. Check suspect words and correct any that were not correctly recognized. ☞ You need not make any corrections in order to save your document. If TextBridge had no trouble recognizing words, there will not be many suspect words to proofread. 15. Click the Save As button when you have completed proofreading.
16. Save the page as Scanning.rtf. TextBridge Provides a suggestion for the file name and uses the last type of file you selected last, automatically appending the appropriate extension. Change the file type to Rich Text Format (RTF), which supports recomposition and is compatible with most word processing applications. Be sure to select Retain pictures and Retain layout. TextBridge formats and saves the document. 17. Open Scanning.rtf in your word processor. Figure 5–24.
19. Reset the Table page type in TextBridge. • Click the Page Type button. • Highlight Table. • Click the Settings button. • Click the Reset button. The original settings for the Table page type will be restored. WHERE TO GO FROM HERE The learning sessions in this chapter were designed to give you a solid basis on which to use TextBridge for your own documents. Additional learning sessions for more advanced topics are available in Chapter 6, “Advanced Sample Sessions.
ADVANCED SAMPLE SESSIONS 6 Previous chapters have introduced you to basic TextBridge capabilities. This chapter provides sample sessions with step-by-step instructions for using several more advanced TextBridge functions. The topics presented in this chapter are in the following list: ◆ Processing a document to use in a database ◆ Using zone templates and page types ◆ Training TextBridge OCR ☞ This chapter uses the same sample documents described in Chapter 5.
To process this document for use in a database: 1. Select the page type. • Click the Page Type button to select Table. • Click OK The settings are automatically changed to table page layout, good print type, any page orientation. 2. Click the Get Pages button. The Get Pages dialog box appears (Figure 6–1). Select table.bmp Figure 6–1. Get Pages dialog box Note The default folder for image files is C:\My Documents\TextBridge\Image Files However, unless you installed TextBridge in another directo
3. In the Get Pages dialog box, double click table.bmp. After TextBridge reads and processes the page image, it displays the page in the image view. 4. Click the Find Zones button. TextBridge automatically finds the zones on the page. Notice that the table zone has lines marking the cell borders (Figure 6–2). Click the Select button Table zoned with cell borders Figure 6–2. Zoned table.bmp in Image View 5. Click the Select Zone button on the toolbar and doubleclick on the table.
6. Click on the page outside the table. The zoning tools replace the table editing tools in the toolbar. 7. Click the Recognize button. TextBridge recognizes the pages and displays it in the Text view, where you can proofread and correct any poorly recognized words (Figure 6–4). Figure 6–4. Table in Text view 8. Click the Save As button. The Save As dialog box appears (Figure 6–5).
Accept the default name, or type a new name Click Save Select Text tab-delimited output format Deselect Open file when done Figure 6–5. Save As dialog box 9. Save the document in text tab-delimited format. • • • ☞ In the Save As dialog box, TextBridge provides a suggestion for the file name. If you prefer another name, enter the new name in the File name box. Select Text tab-delimited (*.txt).
SESSION 2: USING ZONE TEMPLATES AND PAGE TYPES TextBridge provides zone templates as the means to repeatedly process or ignore specific areas on the same type of pages, and save time without rezoning each page. After you create a set of zones, TextBridge lets you save the current set of zones (including their size, location, and type) as a zone template. You can then use the zone template on other documents by specifying it within a page type, or you can load the zone template directly.
3. Double click Scanning.tif in the Get Pages dialog box. TextBridge gets the page, and displays it in the Image view where you can create a zone template. The page you see should be titled “Scanning Industry is Booming.” 4. Click Find Zones. TextBridge automatically zones the page then stops for you to check and change the zones (Figure 6–6). You can now adjust the zoning using the zoning tools. For the purposes of this session, assume that the zones that TextBridge found are fine. Figure 6–6.
Specify the default location Specify the file name Save the template Figure 6–7. Save Zone Template dialog box • Select the default location to save the zone template file. ☞ • • 6. Specify the zone template file name, “My Newsletter.” TextBridge provides a suggestion for the file name, based on the current Page Type, Magazine (b&w). Click the Save button. TextBridge saves the zone template file and closes the Save Zone Template dialog box. Open Page Type Settings.
Click to create a new page type Figure 6–8. Page Type Settings–Magazine (b&w) dialog box 7. Create a new page type. • In the Page Type Settings dialog box, click New to open the New Page Type dialog box (Figure 6–9). Type the new name Enter a description Figure 6–9. New Page Type dialog box • • • • In the New Page Type dialog box, type a new name for the custom page type, “My Newsletter” Type a description for your page type.
8. Select the new page type. • • Select My Newsletter in the Page Type dialog box. Click Settings to open the Page Type Setting dialog box (Figure 6–10). Zone template selected Figure 6–10. Page Type Settings with zone template selected 9. Select the zone template in the Settings dialog box. • Click As zoned by template. • Select My Newsletter in the drop down list. • Click OK to close the Page Type Settings dialog box. • Click OK again to close the Page Type dialog box. 10.
11. Click the Auto button. The Get Pages dialog box appears. 12. In the Get Pages dialog box, double click Scanning.tif. TextBridge gets the file and processes it using the settings defined in My Newsletter, including the zones saved in the template. The Save As dialog box appears 13. Save the document as ScanningNews.rtf. TextBridge formats and saves the document. SESSION 3: TRAINING TEXTBRIDGE OCR To assure the highest possible accuracy, TextBridge provides an interactive training capability.
To process this page and use interactive training: 1. Select the page type. • Click the Page Type button and then select Fax. • Click OK. The settings are automatically set to any page layout, any page orientation, and fax quality. 2. Enable training. Click the drop down arrow on the Recognize button and select Enable Training (Figure 6–11). Click the Recognize drop down arrow Select Enable Training Figure 6–11. Enable training ☞ 3.
Select fax.pcx Figure 6–12. Get Pages dialog box 4. In the Get Pages dialog box, double click fax.pcx. TextBridge opens the page and begins recognition. When TextBridge is unsure of a word, it stops to enable you to train OCR. The Training dialog box appears (Figure 6–13). Click when the word is correct Click when you are done training Suspect word Word image Click if the recognized word image is not a word Click to undo last action Figure 6–13.
5. Change any words that were not accurately recognized. The words shown in the Word box are the results of TextBridge OCR analysis. • • • Examine each word in the Word box and compare it to the word image in the image window. If the word in the box is the word shown in the word image, click the Accept button. TextBridge continues to the next suspect word. or • • If the word in the box is not the word shown in the word image, type the word correctly. Click the Accept button.
Figure 6–14. Save Training dialog box 7. In the Save Training Data dialog box: • Save training data in the Training Data folder. • Enter a file name. • Save the file with a .trn extension. • Click the Save button. The Save Training Data dialog box closes, and the Save As dialog box opens (Figure 6–15). Figure 6–15.
8. Save the page as fax.rtf. TextBridge saves the document to the selected format and opens your word processing application. 9. View the file in your word processor. Figure 6–16. Fax sample document Notice that, even though the input document was a low-quality fax image, TextBridge recognized it with a high degree of character recognition and formatting accuracy. You can use the saved training data to improve the recognition of documents of similar quality and with the same fonts.
INDEX A Accept button, 6–14 Accepting a suspect word, 5–22 Adding a word to the dictionary, 5–23 Adobe Acrobat PDF, 1–10, 6–11 Adobe Acrobat Reader, viii Any Page page type settings, 5–6 Application formats supported, 1–6 Applications supporting layout retention, 1–6 Applications supporting recomposition, 1–6 Assistant, 1–4, 4–19 Auto Process, 5–6 Automatic processing, 4–4 Automatic zoning, 5–28, 6–7 Autorun program, 2–7 B Basic operations, 4–7 Before starting to OCR, 4–1 Black and white images, 1–9 Buil
D Database documents, 5–1 Default folder for image files, 5–4, 6–2 Deferred processing, 1–7 De-installing a previous version of TextBridge, 2–5 De-installing TextBridge Pro Millennium Business Edition, 2–10 Dialog boxes Get Pages, 5–8, 16, 19, 27 Getting Page, 5–8 Instant Access control panel, 5–13 Instant Access to TextBridge, 5–15 New Page Type, 6–9 Open, 5–4 Page Type Settings, 6–9 Page Type, 5–3 Page Type Settings, 5–26 Recognizing, 5–9 Save As, 5–10 Save Training Data, 6–15 Save Zone Template, 6–8 Tra
F Fax documents, 1–9 Fax page images, 1–8 Find Zones button, 5–28 Foreign language recognition, 1–9 Formats supported, 1–6 Formatting with paragraph styles, 3–4 Forms, 4–12 G Get Pages button, 5–4 Get Pages dialog box, 5–4 Getting a page, 5–8 Getting Page dialog box, 5–8 Grayscale images, 1–9, 3–3 Grid lines, 5–25, 6–3 H Help, 4–17 HTML output, 1–7 I Image documents, 3–1 black and white, 1–9 color, 1–9 default location, 5–4, 6–2 grayscale, 1–9 procedure for opening, 5–3 resolution, 1–9 Image file for
L Language and Zones, Tables, and Cells, 3–15 Language installation, 3–13 Language processing, 3–13 Language recognition, 1–4, 1–9, 3–13 Learning sessions, 5–1, 6–1 Letter page type settings, 5–12 Live updates to TextBridge, 1–3, 2–10, 4–21 Location of sample image files, 5–4, 6–2 M Magazine page type settings, 5–18 Manual processing, 4–6, 5–17 Manual zoning, 1–7 Memory requirements, 2–4 Microprocessor needed to run TextBridge Pro, 2–4 Microsoft Word (RTF), 1–10 Modifying page types, 5–26 Monitors, 2–4
Page layout, 5–26 Page recognition, 3–1, 3–7 Page sources, 3–3 Page thumbnails, 4–2 Page type settings, 2–7 dialog box, 5–3, 5–26, 6–9 original page tab, 3–8 processing tab, 3–10 scanner tab, 3–9 Page type templates, 1–6 Page types, 1–6, 2–2, 4–8, 4–9 comparing, 5–26 creating, 6–9 modifying, 5–26 Pagis Pro and TextBridge Pro Millennium Business Edition, 2–6 PDF file, 6–11 PDF Image & Hidden Text, 1–11 PDF Image Only, 1–11 PDF Normal Without Word Images, 1–11 PDF Normal, 1–11 Picture zone, 4–12 Preview tools
Q Questions to ask before starting to OCR, 4–1 R Recognition confidence level, 5–29 Recognition process, 3–1 Recognize button, 5–21 Recognizing a page, 5–9 Recognizing dialog box, 5–9 Recomposition, 1–5, 2–4 limits of, 3–4 text program support for, 1–3 Registration card, 1–2 Release Notes, x Requirements, 2–4 Resizing a zone, 5–29 Retaining page layout, 1–5, 3–4, 5–11 Retaining pictures, 3–4, 5–11 Reverse video, 5–21 RTF, 1–10 Running multiple copies of TextBridge, 5–13 S Annotating a document, 3–16 Sa
Selecting page source, 4–8 Selecting page type, 4–9 Serial number, xi Setup program, 2–7 Show Me How window, 4–19 Software registration card, 1–2 Software serial number, xi Software version number, xi Spreadsheet recomposition, 1–5 Standalone application, 3–5, 4–3 Starting a new document, 6–10 Starting TextBridge, 4–3 Suspect word box, 5–22 System requirements, 2–4 T Tab tables, 5–25 Tab-delimited format, 6–5 Table editing, 6–3 Table, 4–12 cell borders, 6–3 editing, 6–3 zone, 4–12 Technical Support, xi Te
TextBridge (cont.
TextBridge (cont.
V Version number, xi Visioneer sheetfed scanner, 2–2 W Ways You Can Use TextBridge, 4–2 Web site, 4–21 Welcome window, 4–18 What’s This? Help, 1–5 Windows, 1–3, 2–4 Word Image window, 5–22 X Xerox PARC, 1–8 Z Zone order, 4–13 Zone templates, 1–8, 6–6 files from older versions of TextBridge, 2–6 saving, 6–7 Zones, 1–8, 4–12, 5–19 automatic, 1–6, 5–9, 5–19, 6–7 changing type, 5–20 dividing, 5–21 editing, 1–6, 1–2 erasing, 5–21 locating, 5–19 manual, 1–7 pictures, 5–9 resize a zone, 5–29 reverse video t