user manual

113

Building CLEM Expressions

 Determining the length (number of c haracters) for a string variable—length(STRING).

 Checking the alphabe tical ordering of string values—alphabefore(STRING1, STRING2).

 Removing leading o r trailing white s pace fr om values—trim(STRING), trim_start(STRING),

or trimend(STRING).

 Extract the ﬁrst or last n characters from a string—startstring(LENGTH, STRING) or

endstring(LENGTH, STRING). For examp le, suppose you have a ﬁeld named item that combines

a product name with a four-digit ID code (ACME CAMERA-D109). To create a new ﬁeld that

contains only the four-digit cod e, specify the f ollowing formula i n a Derive node :

endstring(4, item)

 Matching a s peciﬁc pattern—STRING matches PATTERN. For exa mple, to sele ct persons with

“market” anywhere in their job title, you could specify the following in a Select node:

job_title matches "*market*"

 Replacing all instances of a substring within a s tring—replace(SUBSTRING, NEWSUBSTRING,

STRING). For e xample, to replace all instances o f an unsupported character, such as a vertical

pipe ( | ), with a semic olon prior to text mining, use the replace function in a Filler node.

Under Fill in ﬁelds:, select all ﬁelds where the character may occur. For the Replace: co ndition,

select Always, and specify the following condition under Replace with:

replace('|',';',@FIELD)

 Deriving a ﬂag ﬁeld based on the presen ce of a speciﬁc substring. For example, you could

use a string function in a Derive node to generate a s eparate ﬂag ﬁeld for each response

with an expression such as:

hassubstring(museums,"museum_of_design")

For more informati on, see th e topic String Func tions in Chapter 8 on p. 141.

Handling Blanks and Missing Values

Replacing blanks or missing v alues is a co mmon data preparation task for data miner s. CLEM

provides you with a number of tools to automate blank handling. The Filler node is the most

common place to work with blanks; however, the following functions can be used in any node that

accepts CLEM expressions:

 @BLANK(FIELD) can be used to determine records whose values are blank for a particular

ﬁeld, such as Age.

 @NULL(FIELD) can be used to determine records whose values are system-missing for the

speciﬁed ﬁeld(s). In IB M® SPSS® Modeler, system-missing values are displayed as $null$

values.