Quick start manual

Data types, variables, and constants
5-13
String types
Long strings
AnsiString, also called a long string, represents a dynamically allocated string whose
maximum length is limited only by available memory.
A long-string variable is a pointer occupying four bytes of memory. When the
variable is empty—that is, when it contains a zero-length string—the pointer is nil
and the string uses no additional storage. When the variable is nonempty, it points a
dynamically allocated block of memory that contains the string value. The eight
bytes before the location contain a 32-bit length indicator and a 32-bit reference
count. This memory is allocated on the heap, but its management is entirely
automatic and requires no user code.
Because long-string variables are pointers, two or more of them can reference the
same value without consuming additional memory. The compiler exploits this to
conserve resources and execute assignments faster. Whenever a long-string variable
is destroyed or assigned a new value, the reference count of the old string (the
variable’s previous value) is decremented and the reference count of the new value
(if there is one) is incremented; if the reference count of a string reaches zero, its
memory is deallocated. This process is called reference-counting. When indexing is
used to change the value of a single character in a string, a copy of the string is made
if—but only if—its reference count is greater than one. This is called copy-on-write
semantics.
WideString
The WideString type represents a dynamically allocated string of 16-bit Unicode
characters. In most respects it is similar to AnsiString. On Win32, WideString is
compatible with the COM BSTR type.
Note
Under Win32 WideString values are not reference-counted. Under Linux, they are.
About extended character sets
Windows and Linux both support single-byte and multibyte character sets as well as
Unicode. With a single-byte character set (SBCS), each byte in a string represents one
character.
In a multibyte character set (MBCS), some characters are represented by one byte and
others by more than one byte. The first byte of a multibyte character is called the lead
byte. In general, the lower 128 characters of a multibyte character set map to the 7-bit
ASCII characters, and any byte whose ordinal value is greater than 127 is the lead
byte of a multibyte character. The null value (#0) is always a single-byte character.
Multibyte character sets—especially double-byte character sets (DBCS)—are widely
used for Asian languages.
In the Unicode character set, each character is represented by two bytes. Thus a
Unicode string is a sequence not of individual bytes but of two-byte words. Unicode
characters and strings are also called wide characters and wide character strings. The
first 256 Unicode characters map to the ANSI character set. The Windows operating