user manual

Appendix

Unicode Support

Unicode Support in IBM SPSS Modeler

IBM® SPSS® Modeler is fully Un icode-enabled for b oth IBM® SPSS® Mode ler and IBM®

SPSS® Modeler Serv er. This makes it possible to exchange data with o ther applications that

support Unicode, in cluding multi-language databas es, witho ut any loss of information that m ight

be caused by co nversion to or f r om a locale-spec iﬁc en coding scheme.

 SPSS Modeler stores Unicode data internally and can read and write multi-language data

stored as Unicode in databases without loss.

 SPSS Modele

r can read and write UTF-8 encoded text ﬁles . Text ﬁle import and export

will def ault to t he locale-encoding but support UTF- 8 as an al ternative. This setting can be

speciﬁed in the ﬁle import and export nodes, or the default encoding can be changed in the

stream properties dialog box. For more information, see the topic Setting general options

for streams in Chapter 5 on p. 55.

 Statistics, SAS, and text data ﬁle s stored in the locale-encoding will be converted to UTF-8 on

import and ba ck again on export. When writing to any ﬁle, if there are Unicode characters

that do no t exist in the locale character set, they will be subst ituted and a warning will be

display ed . Th is should occur only wher e the data has been imported from a data source that

supports U nicode (a database or UTF-8 text ﬁle) and that contains characters from a different

locale or f

rom multiple locales or character sets .

 IBM® SPSS® Modeler Solution Publisher images are UTF-8 encoded and are truly portable

betwee n platforms and locales.

About Unicode

The goal of the Unicode standard is to provide a consistent way to encode multilingual te xt so that

it can be easily shared across borders, l oc ales, and applications. The Unicode Standard, now at

version 4.0.1, deﬁnes a character set that is a supe r set of all o f the character sets in common use

in the world today and assigns to each character a uniqu e name and code point. The charac ters

and their code points are identical to thos e of the Universal Character Set (UCS) deﬁned by

ISO-10646. For more informatio n, see the Unicode Home Page ( http://www.unicode.org).

248