HP Neoview Character Sets Administrator's Guide HP Part Number: 546188-001 Published: April 2009 Edition: HP Neoview Release 2.
© Copyright 2009 Hewlett-Packard Development Company, L.P. Legal Notice Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
Table of Contents About This Document.........................................................................................................7 Intended Audience.................................................................................................................................7 New and Changed Information in This Edition.....................................................................................7 Document Organization.....................................................................
A Character Set Mapping Tables..................................................................................45 B Capabilities and Limitations of Multiple Client Locales in the Unicode Configuration...................................................................................................................47 C Configuring Neoview Client Applications................................................................49 How Character Encoding Is Implemented in the Neoview Transporter Client..............
List of Tables 1-1 1-2 1-3 1-4 2-1 3-1 3-2 3-3 3-4 4-1 5-1 B-1 C-1 C-2 C-3 C-4 C-5 C-6 D-1 D-2 D-3 Character Sets Stored in ISO88591 and UCS2 Columns for the Neoview Character Set Configurations...............................................................................................................................14 Default Prefixes for Character String Literals...............................................................................
About This Document This manual contains the information needed by database administrators and end users to use, configure, and troubleshoot the Neoview Character Sets feature for Release 2.4. Intended Audience This manual is intended for database administrators and other users of the Neoview Character Sets feature on the Neoview platform.
Appendix B (page 47) Describes the capabilities and limitations imposed on multiple client locales in the Unicode configuration for Release 2.4. Appendix C (page 49) Describes how to configure and enable the translation functions of Neoview client applications. Appendix D (page 55) Provides information about mapping character sets and language ID values for the Neoview ODBC and JDBC drivers.
A group of items enclosed in braces is a list from which you are required to choose one item. The items in the list can be arranged either vertically, with aligned braces on each side of the list, or horizontally, enclosed in a pair of braces and separated by vertical lines.
[ESCAPE esc-char-expression] Related Documentation This manual is part of the HP Neoview customer library. Neoview Customer Library The manuals in the Neoview customer library are listed here for your convenience.
Neoview System Monitor Quick Start Instructions for starting, using, customizing, and troubleshooting the Neoview System Monitor. Neoview Workload Management Information about using Neoview Workload Management Services (WMS) to Services Guide manage workload and resources on a Neoview data warehousing platform.
1 Introduction to Neoview Character Sets The Neoview Character Sets feature allows clients to store data encoded in any supported character set, including multibyte data, into SQL database objects on the Neoview platform. Clients include customer applications running on other systems and users accessing Neoview client applications from client workstations.
client locale character set, SJIS characters, or UTF8 characters, depending on the selected Neoview character set configuration. Table 1-1 (page 14) identifies the character set encodings that the Neoview database uses to store characters in ISO88591 and UCS2 columns for the three Neoview character set configurations.
Table 1-2 Default Prefixes for Character String Literals Neoview Character Set Configuration Default Column Default Prefix for Character Set Definition Non-Default Column Character String Literals Non-Default Prefixes for (Does Not Need to Be Character Set Definition (Does Not Need to Be Character String Literals Specified) (Must Be Specified) Specified) (Must Be Specified) ISO88591 CHARACTER SET ISO88591 CHARACTER SET UCS2 _ISO88591 _UCS2, N SJIS CHARACTER SET ISO88591 CHARACTER SET UCS2 _ISO8859
character is mapped to a plane within the Unicode encoding and equivalent characters are mapped to the same Unicode code point. Java-based Neoview client applications such as Neoview DB Admin, Neoview Command Interface, and JDBC applications can display data from multiple client locale characters. Neoview ODBC drivers can display only data that matches the client locale characters and replaces all other characters (by default, with question marks).
Table 1-3 Features, Behaviors, and Limitations of the Neoview Character Set Configurations Configuration Features and Behaviors Limitations ISO88591 For this release, the ISO88591 configuration has these limitations and restrictions: • Character string literals in an SQL statement are assumed to be in the ISO8859-1 encoding. An invalid translation might occur when users attempt to store a character encoding other than ISO8859-1, such as the client locale, in a UCS2 column.
Table 1-3 Features, Behaviors, and Limitations of the Neoview Character Set Configurations (continued) Configuration Features and Behaviors 18 Limitations SJIS • Neoview platforms with the SJIS configuration require Release 2.4 or Release 2.3 Neoview ODBC and JDBC drivers. If you connect a Release 2.2 driver to a Release 2.4 Neoview platform with the SJIS configuration, the connection fails and a connection error is generated.
Compatibility Between Neoview ODBC and JDBC Drivers and Neoview Platforms Table 1-4 summarizes the compatibility between Release 2.4, 2.3, and 2.2 ODBC and JDBC drivers and Release 2.4, 2.3, and 2.2 Neoview platforms. Table 1-4 Driver and Neoview Platform Compatibility Driver Release Version Release 2.4 Neoview Platform Release 2.3 Neoview Platform Release 2.2 Neoview Platform 2.4 Yes Yes Yes 2.3 Yes Yes Yes 2.
2 Selecting a Neoview Character Set Configuration This chapter provides this information: • • • “Criteria for Selecting a Neoview Character Set Configuration” (page 21) “Process for Implementing a Neoview Character Set Configuration” (page 22) “Rules for Migrating to Neoview Release 2.4” (page 22) Criteria for Selecting a Neoview Character Set Configuration Table 2-1 identifies the criteria you should use to identify the correct Neoview Character Set configuration for your Neoview platform.
Process for Implementing a Neoview Character Set Configuration If you are a new customer, follow this process to select and implement the correct Neoview character set configuration: 1. 2. 3. 4. When you order your new Neoview platform, you receive a Neoview Order processing form in the Customer Process Systems Architecture (CPSA) that describes the selection criteria for the supported Neoview character set configurations. Complete the form, including your configuration choice, and return it in the CPSA.
3 Using SQL Language Elements to Define and Manage Database Encoding This chapter includes: • • • • • “Rules for Encoding SQL Language Elements” (page 23) “Behavior of SQL Functions” (page 26) “Behavior of SQL String Functions” (page 27) “Guidelines for the LIKE Predicate in the SJIS and Unicode Configurations” (page 31) “Locating Invalid Characters in Syntax Error Messages” (page 33) Rules for Encoding SQL Language Elements Table 3-1 describes the rules that govern the use of character set data in SQL la
Table 3-1 Summary of SQL Language Rules by Neoview Character Set Configuration (continued) SQL Language Rule ISO88591 Configuration Explicitly specify a Use these prefixes: valid character set • In ISO8859-1 string prefix value literals, you can but are (_character-set) not required to specify: for every string literal — _ISO88591 for in a column that is not ISO8859-1 characters in the default character • In UCS2 string literals, set for the you must specify: configuration.
Table 3-1 Summary of SQL Language Rules by Neoview Character Set Configuration (continued) SQL Language Rule ISO88591 Configuration SJIS Configuration Unicode Configuration SQL Identifiers Size SQL identifiers as • SQL identifiers can be up • SQL identifiers can be a dictated by the selected to 128 characters (bytes) maximum of 64 to 128 configuration. SQL in length. characters in length, identifiers are limited • Regular identifiers are depending on the SJIS to 128 byte lengths in characters stored.
Table 3-1 Summary of SQL Language Rules by Neoview Character Set Configuration (continued) SQL Language Rule ISO88591 Configuration SJIS Configuration Unicode Configuration EMS event messages use client locale character encoding and might not be readable from Neoview DB Admin. Uses UTF8 encoding. Uses UTF8 encoding. EMS Event Messages EMS event messages from the Neoview platform are normally sent in UTF8 encoding.
Behavior of SQL String Functions String functions behave differently in the Neoview Release 2.3 and Neoview Release 2.4 environments. Table 3-3 describes these differences. Table 3-3 String Function Behaviors for Neoview Release 2.3 and Neoview Release 2.4 Neoview Release Limitations Storage Length vs. Character Boundaries Release 2.
Table 3-4 Behaviors of SQL String Functions in the Three Configurations ISO88591 Configuration Considerations SJIS Configuration Considerations Unicode Configuration Considerations If the value of the first byte in the string is greater than 127, Neoview SQL returns error 8428 (“The argument to function ASCII is not valid”). If the value of the first byte in the string is greater than 127, Neoview SQL returns error 8428 (“The argument to function ASCII is not valid”).
Table 3-4 Behaviors of SQL String Functions in the Three Configurations (continued) ISO88591 Configuration Considerations SJIS Configuration Considerations Unicode Configuration Considerations CONCAT: Returns the concatenation of two character value expressions as a character string value. Both character value expressions must be either ISO8859-1 character expressions or UCS2 character expressions.
Table 3-4 Behaviors of SQL String Functions in the Three Configurations (continued) ISO88591 Configuration Considerations SJIS Configuration Considerations Unicode Configuration Considerations RPAD: Pads the right side of a string with Every character, the specified string. including multibyte characters, is treated as one character. Every character, including multibyte characters, is treated as one character. Every character, including multibyte characters, is treated as one character.
Table 3-4 Behaviors of SQL String Functions in the Three Configurations (continued) ISO88591 Configuration Considerations SJIS Configuration Considerations Unicode Configuration Considerations TRANSLATE: Translates a character string • The from a source character set to a target ISO88591TOUCS2 character set. These six option can be used to translate translation-name options can be used: ISO8859-1 • ISO88591TOUCS2 characters to UCS2 • SJISTOUCS2 characters.
In Neoview Release 2.4, one underscore always matches one character, regardless of the byte length of the character. When matching a single multibyte character in the database, specify one underscore in the LIKE pattern. In Neoview Release 2.3, the query in the following example requires you to put two underscores in the LIKE pattern on a Neoview platform with the SJIS configuration, and three underscores in the LIKE pattern on a system with the Unicode configuration.
>>select * from t1 where c1 like x'25 84 5C 25' escape '\'; --- 0 row(s) selected. In Neoview Release 2.4, character strings are compared at the character level, not the byte level. Therefore, the second byte of a double-byte SJIS character in a LIKE pattern is treated as part of the SJIS character and not as an escape sequence or a wild-card character, as this example shows: >>select * from t1 where c1 like x'25 83 5F 84 5F 25'; --- 0 row(s) selected.
^ (25 characters from start of SQL statement) *** ERROR[8822] The statement was not prepared. >> Example of a Syntax Error in a Multi-Line SQL Statement This is an example of a syntax errors on a command that uses more than 945 characters. The character-identifying text with a carat is provided at the end of the syntax error. SELECT T.WK_END_DT , TRIM(TRAILING FROM T.FCL_YR_ID) , TRIM(TRAILING FROM T.FCL_PER_ID) , P.PKY_ID AS PRIM_KEY , TRIM(TRAILING FROM P.PKY_DSC_TX) AS KEY_DSCR , SUM(A.
4 Capabilities and Limitations of Neoview Client Applications This chapter describes the capabilities and limitations of these Neoview client applications with respect to the Neoview Character Sets feature for this Neoview release: • • • • • • “Neoview Command Interface (NCI)” (page 35) “Neoview DB Admin” (page 35) “Neoview Loader” (page 36) “Neoview Management Dashboard” (page 36) “Neoview Manageability Repository” (page 37) “Neoview Transporter Client” (page 37) This chapter also describes the capabilit
For this Neoview release, Neoview DB Admin imposes these restrictions: • • • If the characters you enter from Neoview DB Admin are not recognized by or compatible with the SQL database, the Neoview DB Admin operation will be rejected by SQL with an error. When the ISO88591 configuration is used, Neoview DB Admin reverts to using only 7-bit ASCII characters and will not support the use of 8-bit ASCII or multibyte characters from the client locales.
• • • • • • Non-ASCII characters do not occur in the other fields defined for these entities, except show related displays. Non-ASCII character set support has been provided for the East Asian locales supported by the latest Neoview release. In Show Related displays, characters outside the ASCII numeric code set are not displayed. Capture of EDL with non-ASCII data is not supported.
Neoview Workload Management Services (WMS) For this Neoview release, WMS has these character encoding behaviors: • • • • • • • WMS service names can be defined and created through ODBC, JDBC, or NCI. The service names can be provided in any character set that is supported by the Neoview platform for this Neoview release. Service names are sent to EMS logs in UTF8.
5 Troubleshooting Guidelines for Neoview Character Sets Users Table 5-1 identifies the Neoview Character Sets-related problems that you might need to troubleshoot. For each problem type, the symptoms, probable causes, and recommended corrective actions are provided. If you encounter problems that are not described here, contact your HP support provider.
Table 5-1 Troubleshooting Symptoms, Causes, and Recommended Corrective Actions for Users (continued) Problem Type 40 Symptoms Probable Causes Recommended Corrective Actions Correct • An incompatible character string character set error literal prefixes (4039) is displayed not provided in at a client an SQL workstation. statement • The DDL or DML statement fails.
Table 5-1 Troubleshooting Symptoms, Causes, and Recommended Corrective Actions for Users (continued) Problem Type Symptoms Probable Causes Incompatible client locale errors in the ISO88591 configuration These symptoms occur Causes can include: when a client • You inserted characters from workstation attempts one client workstation into an to query the Neoview SQL table column and database: attempted to retrieve • An error is incompatible characters from generated stating another workstation.
Table 5-1 Troubleshooting Symptoms, Causes, and Recommended Corrective Actions for Users (continued) Problem Type Symptoms Probable Causes Recommended Corrective Actions Incompatible client locale errors when an ODBC driver-connected query application is used in the Unicode configuration When a client workstation attempts to view character data from a Neoview ODBC driver-connected query application, these symptoms occur: • A translation error is generated stating that you inserted characters from anoth
Unicode format conversions (UTF16, UTF8, or UCS2) along the way. Once these conversions map the SJIS character hexadecimal value to a Unicode code point value that is shared by other SJIS character hexadecimal values, it can be difficult to determine the original SJIS character hexadecimal value. When the hexadecimal values of SJIS characters with the same Unicode code point value are compared and found not to be the same, a SJIS character mismatch has occurred.
• hexadecimal values as parameters untranslated to the Neoview database. If the Neoview database attempts to match the hexadecimal values of SJIS characters with the same Unicode code point value from both sources, they will not match. The ODBC driver manager sends the Neoview ODBC driver an SQL statement that contains both SJIS MS932-encoded character string literals and character values as parameters that have the same original hexadecimal values.
A Character Set Mapping Tables The Neoview platform and its clients use mapping tables for these character sets to support the Neoview Character Sets feature for this Neoview release: • Big5 • EUC-JP • GB2312 • GB18030 • GBK • KSC5601-1987 • SJIS To access these mapping tables, see Mapping Tables for Neoview Character Sets.
B Capabilities and Limitations of Multiple Client Locales in the Unicode Configuration This appendix describes the capabilities and limitations imposed on multiple client locales in the Unicode configuration for this Neoview release.
C Configuring Neoview Client Applications The Neoview Transporter, Neoview Loader, Neoview ODBC drivers, and Neoview JDBC driver each provides certain translation functions on client locale character encoding inserted into the Neoview database and database encoding retrieved by the client workstations.
Table C-1 How Pass-Through Mode and UTF16 Conversion Are Implemented From the Neoview Transporter Client How Pass-Through Mode is Enabled and Disabled for the Transporter Client How to Enable and Disable UTF16 Conversion for Java Strings Additional Guidelines The JDBC connectivity server communicates the current ISO_MAPPING value to the Transporter client so that it knows what character set to store in ISO88591 columns in each of the three configurations: • If the value of ISO_MAPPING is ISO88591 (ISO8859
NOTE: A syntax error occurs if the encoding option is specified for a JDBC or JMS source. On a load operation, the data file is read using the specified or default encoding and converted to UTF16 Java strings, then encoded in the character set specified by ISO_MAPPING for ISO88591 columns or retained in UTF16 encoding for UCS2 columns. On an extract job, the reverse actions occur. Data is extracted from the Neoview database and converted from its database encoding into UTF16 Java strings.
• JMS URL • Startseq • endseq • Comments • Named control file elements including type format, data format, map, source, and job When the Transporter client operates in pass-through mode, incoming data is recognized and managed as single-byte containers, not as distinct and separate characters. Consequently, field delimiters, nullstring, startseq, and endseq values should always be limited to single-byte characters that will not be mistaken for the second byte of multibyte character data.
For information about input file encoding features of the Neoview Loader for this Neoview release, see “Neoview Loader” (page 36). How Character Encoding Is Implemented in the Neoview ODBC Driver for Windows The Neoview ODBC driver for Windows loads a separate translation DLL to perform translations on character data sent through the driver. Table C-3 (page 53) describes the translation behavior of the Neoview ODBC driver for Windows for the three Neoview character set configurations.
How Character Encoding Is Implemented in the Neoview JDBC Driver The Neoview JDBC driver, which uses the Java runtime environment to perform character set translation, automatically enables or disables translation between client locale and database encoding based on the Neoview character set configuration of the Neoview platform. Table C-6 (page 54) describes the translation behavior of the Neoview JDBC driver for the three Neoview character set configurations.
D Neoview ODBC and JDBC Driver Mappings of Character Sets and Language IDs This appendix provides information about the language ID values that map to the client locale character sets supported by the Neoview ODBC drivers and Neoview JDBC driver. The Language attribute used on the client side of the Neoview platform can take one of several values. The default value for the character set is SYSTEM_DEFAULT. For this Neoview release, users cannot specify other values for these character sets.
Because there is no Microsoft driver manager on the *nix side, the Neoview ODBC drivers for UNIX take as input any character that is sent “as is” by the client application. If your language settings match any of those listed in Table D-2 (page 55), the Neoview ODBC drivers for UNIX perform the required translations. If the language settings do not match, the driver uses pass-through mode, meaning that all character data is sent to the server “as is.
Glossary character set A mapping of characters to code point values. client locale In the context of the Neoview Character Sets feature, the character set used by a client. compatible character sets Two or more character sets are compatible when every character in one character set can be successfully mapped to a character in the other character set, although not necessarily with the same code point values.
Index C Capabilities and limitations multiple client locales in Unicode configuration, 47 Neoview Command Interface, 35 Neoview DB Admin, 35 Neoview Loader, 36 Neoview Manageability Repository, 37 Neoview Management Dashboard Client, 36 Neoview Transporter, 37 Workload Management Services, 38 Character set column definitions, 13 Client locale character encoding overview, 15 Compatibility between drivers and Neoview database, 19 Compatible client locales, 15 Configuring JDBC driver, 54 Neoview Loader, 52 Neo