Neoview Character Sets Administrator's Guide (R2.4, R2.5)
NOTE: A syntax error occurs if the encoding option is specified for a JDBC or JMS source.
On a load operation, the data file is read using the specified or default encoding and converted
to UTF16 Java strings, then encoded in the character set specified by ISO_MAPPING for ISO88591
columns or retained in UTF16 encoding for UCS2 columns. On an extract job, the reverse actions
occur. Data is extracted from the Neoview database and converted from its database encoding
into UTF16 Java strings. Those strings are then encoded using the encoding specified in the
control file or, if not specified there, by the default encoding and written to the target source.
You can control how encoding and decoding errors are handled when user data is loaded. The
NVT.encoding-error-disposition system property controls how unmappable or malformed
characters are handled. Allowed property values are REPLACE, REPORT and IGNORE, all of
which are case-insensitive. The default is REPORT, which means the record containing the
characters that cannot be encoded is rejected as a bad record. REPLACE replaces the offending
character with a replacement character that defaults to a question mark or is specified by the
NVT.encoding-error-replacementString system property. IGNORE causes the offending
character to be skipped over and the process continues with the next character.
Control File Option Syntax
[ encoding = "encoding" ]
Where encoding specifies any valid Java character set encoding.
Control File Example
options {
# encoding to use if not specified by data source
encoding = "UTF-8",
truncate = "true"
.
.
.
}
sources {
# encoding overrides the UTF-8 specified in the
# global options section
ex_file_1 file "/data/ex_file_1" options (encoding =
"SJIS"),
# encoding is UTF-8 as specified in global options
ex_file_2 pipe "./data-files/test_data_FSR030-pipe"
.
.
.
}
Encoding Control Files
A control file is a text file that instructs the Transporter client how you want your data moved
from source to target for loading or extracting purposes.
Control file characters are encoded in UTF8. UTF8 supports existing control files and allows
non-ASCII characters to be used in newly-created control files.
These areas of the control file can contain non-ASCII values:
• Filenames (source and include)
• SQL identifiers (schema, table, column/field)
• String literals in SQL statements
• Datasource name
• Field delimiter
• Nullstring
How Character Encoding Is Implemented in the Neoview Transporter Client 51