HP-UX 11i Release Notes (December 2000)

New and Changed Internationalization Features
Unicode Character Set
Chapter 15276
Unicode Character Set
HP-UX 11i provides system level support for the Unicode 2.1/ISO-10646
character set. Hewlett-Packard’s support for Unicode provides a basis of
enabling heterogeneous interoperability for all locales.
ISO-10646 is an industry standard for defining a single encoding which
uniquely encodes all the world’s characters. Unicode 2.1 is the
companion specification to ISO-10646, Unicode support conforms with
existing X/Open (OpenGroup), POSIX, ISO C and other relevant
UNIX-based standards.
HP-UX 11i supports Unicode/ISO-10646 by utilizing the UTF-8
(Universal Transformation Format-8) representation for persistent
storage. UTF-8 is an industry recognized 8-bit multibyte format
representation for Unicode. This representation allows for successful
data transmission over 8-bit networking protocols as well as for safe
storage and retrieval within a historically byte-oriented operating
system such as HP-UX.
For internal processing, HP-UX utilizes the four-octet (32-bit) canonical
form specified in ISO-10646. This support allows parity with HP-UX’s
current wchar_t implementation which has been based on a 32-bit
representation.
Full systems level support is provided for all locales provided in this
release.
For more information on the Unicode features of Asian System
Environment, see /usr/share/doc/ASX-UTF8.
A select subset of locale binaries have been provided for 32-bit
application processing:
Table 15-1 Base
C.utf8 C UTF-8
univ.utf8 universal
Table 15-2 European
fr_CA.utf8 French Canadian