KOFAX Transformation Modules Invoice Pack 1.0 Configuration Guide 10300804-000 Rev 1.
© 2008 Kofax, Inc., 16245 Laguna Canyon Road, Irvine, California 92618, U.S.A. All rights reserved. Use is subject to license terms. Third-party software is copyrighted and licensed from Kofax’s suppliers. THIS SOFTWARE CONTAINS CONFIDENTIAL INFORMATION AND TRADE SECRETS OF KOFAX, INC. USE, DISCLOSURE OR REPRODUCTION IS PROHIBITED WITHOUT THE PRIOR EXPRESS WRITTEN PERMISSION OF KOFAX, INC.
Contents How To Use This Guide ...........................................................................................................5 Kofax Transformation Modules .................................................................................... 6 Installation Guide for Kofax Transformation Modules ...................................... 6 User Guide for Kofax Transformation Modules .................................................. 6 Help.............................................................
Contents Locators .................................................................................................................... 16 Field Settings ........................................................................................................... 17 Validation ................................................................................................................ 17 PONumber ...............................................................................................................
Chapter 1 How To Use This Guide Introduction This guide provides detailed technical and configuration information for the Kofax Invoice Pack. The Invoice Pack enables high-performance extraction of invoice header information without requiring an in-depth knowledge of Kofax Transformation Modules. This guide shows how to modify and configure an Invoice Pack within the Kofax Transformation - Project Builder, including scripts and settings.
Contents Related Documentation This section contains information about related documentation included with Kofax Transformation Modules and Invoice Pack. Kofax Transformation Modules The following documents are included within Kofax Transformation Modules. Installation Guide for Kofax Transformation Modules This installation guide is provided as a separate document in the Kofax Transformation Modules software package.
How To Use This Guide Kofax Transformation Modules Release Notes Late-breaking product information is available from the release notes. You should read the release notes carefully, as they contain information that may not be included in other Kofax Transformation Modules documentation. Kofax Transformation Modules Invoice Pack The following documents are included within Kofax Transformation Modules Invoice Pack.
Contents If you need to contact Kofax Technical Support, please have the following information available: 8 Kofax Capture software version Kofax Transformation Modules software version Kofax Transformation Modules Invoice Pack version Operating system and service pack version Network and client configuration Copies of your error log files Scanner make and model Scanner engine (board) type Special/custom configuration or integration information Kofax Transformation Modules Invoice Pack
Configuration Chapter 2 Configuration Overview This section covers the advanced configuration of Kofax Transformation Modules Invoice Pack. The basic configuration is covered within the Kofax Transformation Modules Invoice Pack Getting Started Guide. In this section classification will be discussed with reference to the Kofax Transformation Modules scripting, which is used to classify documents, and other script calls which occur at classification. This section will also cover extraction.
Contents locators, but are instead populated based on logic performed in script. Therefore this section will focus first on these locators, then on the scripting which populates the fields. DB_Locator This locator performs a fuzzy lookup on the supplier database. In order to find a match, the locator checks data from the document and attempts to match it to a record in the database.
Configuration Document_BeforeProcessXDoc This event is triggered before any processing is carried out. The scripted event is called to check that the incoming document is an image to be OCRed and extracted, rather than an image with data from an electronic transaction. If it finds that the required information already exists, it suppresses the OCR and sets a flag to be used in later script calls. Document_AfterClassifyXDoc This event is triggered after the initial document classification.
Contents All format locators used by the project have dynamic keywords. This means that the keywords visible in the locators are not necessarily those used when the configuration is being run in a live system. The locator keywords are populated from a “LanguageDict.xml” file which can be found in the configuration directory under the installation path in “Project\Knowledgebases\Config\”. Total The Total field is used to store the total amount shown on the invoice.
Configuration Field Settings The Total field takes its result from the SE_FinalAmounts locator mentioned above. It is formatted using the DefaultAmountFormatter, which does basic amount formatting. The field requires a confidence of 80%. The “min. confidence to set reread result” is set dynamically based on field OCR Confidence settings taken from the config.xml file. Validation This field is part of a group validation which includes SubTotal, TotalTax and Total.
Contents SE_SubTotal – This is a standard evaluator which returns the best of the following locator sub-fields: KBa.SubTotal, SL_ResolvedAmounts.NetAmount, IHL.Netto1. SE_Net0 – This is a standard evaluator which returns the best of the following locator sub-fields: KBa.NetAmount0, IHL.Netto0. SL_FinalAmounts – This is a script locator which performs some additional logic before the amounts are returned.
Configuration FL_Discount – This is another standard format locator that looks for amounts which have the text “discount” nearby. This is used only for the en-GB locale to deal with the invoice problem. Results from this locator are used by the SL_ResolvedAmounts script locator. IHL – Invoice Header Locator is overridden at the locale level to include locale specific keywords and tax rates. This attempts to find all amounts and other header information (such as invoice number, invoice date).
Contents SL_DefaultTaxRate – Script locator which is used to retrieve a default tax rate to use. This is defined at the locale level so that the default amount can be modified. SE_TaxRate – This is a standard evaluator which takes the best result from the following locators/sub-fields: KBa.TaxRate1 and SL_DefaultTaxRate. In most cases, this will simply return the default tax rate. Field Settings The TaxRate1 field takes its result from the SE_TaxRate locator mentioned above.
Configuration Field Settings The TotalTax field takes its result from the SE_FinalAmounts locator mentioned above. It is formatted using the DefaultAmountFormatter, which does basic amount formatting. The field is set, by default, to require a high confidence to be valid, and its default setting is 80%. The “min. confidence to set reread result” is set dynamically based on the field OCR Confidence settings taken from the config.xml file.
Contents SL_SetRegExlocator above. The results found by this locator are used by the FL_PONumber_West and FL_PONumber_North locators mentioned below. FL_PONumber_West – This format locator takes the results of the FL_PONumber_RegExp locator above. It contains keywords which exist to the west of the possible results. FL_PONumber_North – This format locator takes the results of the FL_PONumber_RegExp locator above. It contains keywords which exist to the north of the possible results.
Configuration available to the standard evaluator, so results from the SL_ResetRegEx locator are set to 0% in this scenario. SE_PONumber – This is a standard evaluator which selects a result from the locators above. As mentioned, this evaluator works on a “first of” rather than “best of” approach to allow for the enhanced dynamic regular expression logic. Field Settings The PONumber field takes its result from the SE_PONumber evaluator mentioned above.
Contents database for the current supplier, the dynamic regular expression is updated and enabled while the generic regular expression is disabled. Should no specific expression be found, the dynamic expression will be disabled and the generic enabled. This allows a higher precision of results where possible, while providing a fallback for when customer information is not present. FL_InvoiceNumber_RegExp – This is a format locator which contains 2 regular expressions and no keywords.
Configuration 3. 4. expression) is updated. The settings of the format locator are then changed so that the generic expression is disabled and the dynamic expression is enabled. If no dynamic expression is available, then the generic expression is enabled. There are then more format locators (mentioned above) which run keywords on the SL_ResetRegEx results in order to identify the correct possibility.
Contents Locators The InvoiceDate field relies upon the following locators: KBi – This is a standard invoice group locator which uses generic and specific knowledge bases in an attempt to retrieve invoice header information such as invoice date and invoice number. FL_Dates – This is a regular expression based format locator that retrieves all dates on a document. This is used to feed results into the FL_InvoiceDate locator and the Invoice Header Location.
Configuration Locators The DocumentType field uses the following locators: FL_DocumentType – This format locator contains “regular expressions” which are in fact just keywords used to identify a credit note. No settings exist at the invoice level, all setup is done at the locale level with the locator overridden. SL_DocumentType – This script locator is also blank at the invoice level, with all code sitting instead at the locale level with the locator overridden.
Contents 2. 3. 4. 24 This rereading returns the OCR confidence of the text at that location. The retrieved confidence is then compared with the threshold value stored in the “min. confidence to accept reread result” field. If the reread confidence is equal to or greater than the threshold value, the fields valid status is left as is. If the reread confidence is less than the threshold value, the field is marked as invalid and a localized error message set to indicate the reason in validation.
Index Configuration, 9 Classification, 9 Extraction, 11 Overview, 9 Release Notes, 7 Scripting Help, 6 User Guide, 6 Kofax Transformation Modules Invoice Pack Getting Started Guide, 7 Release Notes, 7 E L Extraction Events Document_AfterClassifyXDoc, 11 Document_BeforeProcessXDoc, 11 Locators DB_Locator, 10 DocumentType, 24 FL_Amounts, 12, 13, 15, 17 FL_Dates, 23 FL_Discount, 12, 13, 15 FL_DocumentType, 24 FL_InvoiceDate, 23 FL_InvoiceNumber_North, 21 FL_InvoiceNumber_RegExp, 21 FL_InvoiceNumber_West,
Index SE_Tax, 17 SE_TaxRate, 16 SE_Total, 13 SL_DefaultTaxRate, 16 SL_DocumentType, 24 SL_FinalAmounts, 17 SL_FinalAmounts, 13 SL_FinalAmounts, 14 SL_InvoiceDate, 23 SL_ResetRegex, 21 SL_ResetRegex, 19 SL_ResolvedAmounts, 17 SL_ResolvedAmounts, 12 SL_ResolvedAmounts, 14 SL_ResolvedAmounts, 15 SL_SetRegEx, 18, 20 SubTotal, 13 TaxRate1, 16 Total, 12 TotalTax, 16 O OCR Confidence, 24 R Related Documentation, 6 T Training, 7 V Validation InvoiceNumber, 22 NetAmount0, 16 PONumber, 20 SubTotal, 14 Total, 13