user manual

113
Building CLEM Expressions
Determining the length (number of c haracters) for a string variable—length(STRING).
Checking the alphabe tical ordering of string values—alphabefore(STRING1, STRING2).
Removing leading o r trailing white s pace fr om values—trim(STRING), trim_start(STRING),
or trimend(STRING).
Extract the rst or last n characters from a string—startstring(LENGTH, STRING) or
endstring(LENGTH, STRING). For examp le, suppose you have a eld named item that combines
a product name with a four-digit ID code (ACME CAMERA-D109). To create a new eld that
contains only the four-digit cod e, specify the f ollowing formula i n a Derive node :
endstring(4, item)
Matching a s pecic pattern—STRING matches PATTERN. For exa mple, to sele ct persons with
“market anywhere in their job title, you could specify the following in a Select node:
job_title matches "*market*"
Replacing all instances of a substring within a s tring—replace(SUBSTRING, NEWSUBSTRING,
STRING). For e xample, to replace all instances o f an unsupported character, such as a vertical
pipe ( | ), with a semic olon prior to text mining, use the replace function in a Filler node.
Under Fill in elds:, select all elds where the character may occur. For the Replace: co ndition,
select Always, and specify the following condition under Replace with:
replace('|',';',@FIELD)
Deriving a ag eld based on the presen ce of a specic substring. For example, you could
use a string function in a Derive node to generate a s eparate ag eld for each response
with an expression such as:
hassubstring(museums,"museum_of_design")
For more informati on, see th e topic String Func tions in Chapter 8 on p. 141.
Handling Blanks and Missing Values
Replacing blanks or missing v alues is a co mmon data preparation task for data miner s. CLEM
provides you with a number of tools to automate blank handling. The Filler node is the most
common place to work with blanks; however, the following functions can be used in any node that
accepts CLEM expressions:
@BLANK(FIELD) can be used to determine records whose values are blank for a particular
eld, such as Age.
@NULL(FIELD) can be used to determine records whose values are system-missing for the
specied eld(s). In IB M® SPSS® Modeler, system-missing values are displayed as $null$
values.