HP-UX 11i June 2003 Release Notes
New and Changed Internationalization Features
Multibyte Support Extension and Unix98 Support
Chapter 16
333
Multibyte Support Extension and Unix98 Support
new at 11i
original release
A new set of multibyte APIs have been added to libc following the C99 specification
(ISO/IEC 9899:1999), and the Unix98 specification.
These APIs extend the already existing multibyte and wide character APIs in order to be
able to:
• perform input and output of wide character, or multibyte character, or both
• perform general wide string manipulation
• provide extended capabilities for conversion between multibyte and wide character
sequences
Several new design concepts have been introduced:
• Stream orientation
• Restartable APIs and the conversion state
Stream Orientation
A stream can be either wide-character or byte-oriented. The orientation of a stream is a
concept based on an input/output model that assumes that characters are handled as
wide characters within an application and stored as multibyte characters in files, and
that all the wide-character input/output functions begin executing with the stream
positioned at the boundary between two multibyte characters.
After a stream is associated with a file, but before any operations are performed on the
stream, the stream is without orientation. If a wide-character input or output function is
applied to a stream without orientation, the stream becomes wide-oriented implicitly.
Likewise, if a byte input or output operation is applied to a stream without orientation,
the stream becomes byte-oriented implicitly. Once the stream becomes oriented, the
orientation is fixed and cannot be changed until the stream is closed.
Restartable APIs and the Conversion State
A new set of APIs have been introduced to facilitate the conversion between multibyte
character representations to wide character representations. These APIs use a new
object type, mbstate_t, that can hold the conversion state information necessary to
convert between sequences of multibyte characters and wide characters. The conversion
state determines the behavior of a conversion between multibyte and wide-character
encoding. For conversion from multibyte characters to wide characters, the conversion
state stores information, such as the position, within the current multibyte character (as
a sequence of characters or a wide character accumulator). For conversions in either
direction, the conversion state stores the current shift state, if any, and possibly, the
encoding rule.
As these APIs store the partial character information, a multibyte sequence can be
processed one byte at a time, and the processing can be interrupted and continued (i.e.,
restarted) at some other point in time, so the new multibyte/wide-conversion utilities are
thus made restartable by using the information in the mbstate_t object.