User Guide

Table Of Contents

406 Chapter 17: Developing Globalized Applications

About character encodings

A character encoding maps each character in a character set to a numeric value that can be

represented by a computer. These numbers can be represented by a single byte or multiple bytes.

For example, the ASCII encoding uses seven bits to represent the Latin alphabet, punctuation,

and control characters.

You use Japanese encodings, such as Shift-JIS, EUC-JP, and ISO-2022-JP, to represent Japanese

text. These encodings can vary slightly, but they include a common set of approximately 10,000

characters used in Japanese.

The following terms apply to character encodings:

SBCS Single-byte character set; a character set encoded in one byte per character, such as

ASCII or ISO 8859-1.

DBCS Double-byte character set; a method of encoding a character set in no more than two

bytes, such as Shift-JIS. Many character encoding schemes that are referred to as double-byte,

including Shift-JIS, allow mixing of single-byte and double-byte encoded characters. Others, such

as UCS-2, use two bytes for all characters.

MBCS Multiple-byte character set; a character set encoded with a variable number of bytes per

character, such as UTF-8.

The following table lists some common character encodings; however, there are many additional

character encodings that browsers and web servers support:

The World Wide Web Consortium maintains a list of all character encodings supported by the

Internet. You can find this information at www.w3.org/International/O-charset.html.

Encoding Type Description

ASCII SBCS 7-bit encoding used by English and Indonesian Bahasa languages

Latin-1

(ISO 8859-1)

SBCS 8-bit encoding used for many Western European languages

Shift_JIS DBCS 16-bit Japanese encoding

Note: You must use an underscore character (_), not a hyphen (-) in the

name in CFML attributes.

EUC-KR DBCS 16-bit Korean encoding

UCS-2 DBCS Two-byte Unicode encoding

UTF-8 MBCS Multibyte Unicode encoding. ASCII is 7-bit; non-ASCII characters used in

European and many Middle Eastern languages are two-byte; and most

Asian characters are three-byte