Quick start manual
Data types, variables, and constants
5-13
String types
Long strings
AnsiString, also called a long string, represents a dynamically allocated string whose 
maximum length is limited only by available memory. 
A long-string variable is a pointer occupying four bytes of memory. When the 
variable is empty—that is, when it contains a zero-length string—the pointer is nil 
and the string uses no additional storage. When the variable is nonempty, it points a 
dynamically allocated block of memory that contains the string value. The eight 
bytes before the location contain a 32-bit length indicator and a 32-bit reference 
count. This memory is allocated on the heap, but its management is entirely 
automatic and requires no user code.
Because long-string variables are pointers, two or more of them can reference the 
same value without consuming additional memory. The compiler exploits this to 
conserve resources and execute assignments faster. Whenever a long-string variable 
is destroyed or assigned a new value, the reference count of the old string (the 
variable’s previous value) is decremented and the reference count of the new value 
(if there is one) is incremented; if the reference count of a string reaches zero, its 
memory is deallocated. This process is called reference-counting. When indexing is 
used to change the value of a single character in a string, a copy of the string is made 
if—but only if—its reference count is greater than one. This is called copy-on-write 
semantics.
WideString
The WideString type represents a dynamically allocated string of 16-bit Unicode 
characters. In most respects it is similar to AnsiString. On Win32, WideString is 
compatible with the COM BSTR type. 
Note
Under Win32 WideString values are not reference-counted. Under Linux, they are.
About extended character sets
Windows and Linux both support single-byte and multibyte character sets as well as 
Unicode. With a single-byte character set (SBCS), each byte in a string represents one 
character. 
In a multibyte character set (MBCS), some characters are represented by one byte and 
others by more than one byte. The first byte of a multibyte character is called the lead 
byte. In general, the lower 128 characters of a multibyte character set map to the 7-bit 
ASCII characters, and any byte whose ordinal value is greater than 127 is the lead 
byte of a multibyte character. The null value (#0) is always a single-byte character. 
Multibyte character sets—especially double-byte character sets (DBCS)—are widely 
used for Asian languages.
In the Unicode character set, each character is represented by two bytes. Thus a 
Unicode string is a sequence not of individual bytes but of two-byte words. Unicode 
characters and strings are also called wide characters and wide character strings. The 
first 256 Unicode characters map to the ANSI character set. The Windows operating 










