HP-UX 11.0 - 11i Internationalization Features White Paper

Miscellaneous Modifications
Multibyte Support Extension and Unix98 Support [11i v1]
Chapter 7
86
Multibyte Support Extension and Unix98 Support [11i v1]
A new set of multibyte APIs have been added to libc following C99 specification (ISO/IEC 9899:1999), and
the Unix98 specification.
These APIs extend the already existing multibyte and wide character APIs in order to be able to:
Perform input and output of wide character, or multibyte character, or both.
Perform general wide string manipulation.
Provide extended capabilities for conversion between multibyte and wide character sequences.
The following new design concepts have been introduced:
Stream orientation
Restartable APIs and the conversion state
Stream Orientation
A stream can be either wide character- or byte-oriented. The orientation of a stream is a concept based on an
input/output model that assumes that characters are handled as wide-characters within an application and
stored as multibyte characters in files, and that all the wide-character input/output functions begin executing
with the stream positioned at the boundary between two multibyte characters.
After a stream is associated with a file, but before any operations are performed on the stream, the stream is
without orientation. If a wide-character input or output function is applied to a stream without orientation,
the stream becomes wide-oriented implicitly. Likewise, if a byte input or output operation is applied to a
stream without orientation, the stream becomes byte-oriented implicitly. Once the stream becomes oriented,
the orientation is fixed and cannot be changed until the stream is closed.
Restartable APIs and the Conversion State
A new set of APIs have been introduced to facilitate the conversion between multibyte character
representations to wide character representations. These APIs use a new object type, mbstate_t, that can
hold the conversion state information necessary to convert between sequences of multibyte characters and
wide characters. The conversion state determines the behavior of a conversion between multibyte and
wide-character encodings. For conversion from multibyte characters to wide characters, the conversion state
stores information, such as the position, within the current multibyte character (as a sequence of characters
or a wide character accumulator). For conversions in either direction, the conversion state stores the current
shift state, if any, and possibly, the encoding rule.
As these APIs store the partial character information, a multibyte sequence can be processed one byte at a
time, and the processing can be interrupted and continued (restarted) at some other point in time, so the new
multibyte/wide conversion utilities are thus made restartable by using the information in the mbstate_t
object.
How to Get MSE/Unix98 Behavior
In order to get MSE/Unix98 behavior, the programs have to be compiled with the -D_XOPEN_SOURCE=500
macro definition and the variable UNIX_STD has to be defined in the environment.
Under the Korn, Bourne, and POSIX shells, this is accomplished with:
UNIX_STD=98
export UNIX_STD