User Guide

Table Of Contents
406 Chapter 17: Developing Globalized Applications
About character encodings
A character encoding maps each character in a character set to a numeric value that can be
represented by a computer. These numbers can be represented by a single byte or multiple bytes.
For example, the ASCII encoding uses seven bits to represent the Latin alphabet, punctuation,
and control characters.
You use Japanese encodings, such as Shift-JIS, EUC-JP, and ISO-2022-JP, to represent Japanese
text. These encodings can vary slightly, but they include a common set of approximately 10,000
characters used in Japanese.
The following terms apply to character encodings:
SBCS Single-byte character set; a character set encoded in one byte per character, such as
ASCII or ISO 8859-1.
DBCS Double-byte character set; a method of encoding a character set in no more than two
bytes, such as Shift-JIS. Many character encoding schemes that are referred to as double-byte,
including Shift-JIS, allow mixing of single-byte and double-byte encoded characters. Others, such
as UCS-2, use two bytes for all characters.
MBCS Multiple-byte character set; a character set encoded with a variable number of bytes per
character, such as UTF-8.
The following table lists some common character encodings; however, there are many additional
character encodings that browsers and web servers support:
The World Wide Web Consortium maintains a list of all character encodings supported by the
Internet. You can find this information at www.w3.org/International/O-charset.html.
Encoding Type Description
ASCII SBCS 7-bit encoding used by English and Indonesian Bahasa languages
Latin-1
(ISO 8859-1)
SBCS 8-bit encoding used for many Western European languages
Shift_JIS DBCS 16-bit Japanese encoding
Note: You must use an underscore character (_), not a hyphen (-) in the
name in CFML attributes.
EUC-KR DBCS 16-bit Korean encoding
UCS-2 DBCS Two-byte Unicode encoding
UTF-8 MBCS Multibyte Unicode encoding. ASCII is 7-bit; non-ASCII characters used in
European and many Middle Eastern languages are two-byte; and most
Asian characters are three-byte