user manual

102
Chapter 6
Screening or Removing Fields
To screen out elds with too many mis sing value s, you have seve r al options:
You can use a Data Audit node to lter eld s based on quality.
You can use a Featu r e Selection node to screen out elds with more than a specied percentage
of missing valu es and to rank elds based on importance relative to a specied target.
Instead of rem oving the elds, you ca n use a Type node to se t the e ld role to None. This will
keep the elds in the data set but exclude them from the modelin g processes.
Imputing or Filling Missing Values
In cases where there are only a few missing values, it may be useful to in sert values to replace
the blanks. You can do this fr om the Data Audit report, which allows you t o specify options for
specic elds as appropriate and then generate a SuperNode that imputes values using a number
of methods. This is the most exible method, and it al so allows you to specify handling for large
numbers of elds in a single node.
The following methods are available f or impu ting missing values:
Fixed. Substitutes a xed value (either the eld mean, midpoint of the range, or a constant that
you specify).
Random. Substitutes a random value based on a normal or uniform distribution.
Expression. Allows you to specify a custom expressio n. For example, you coul d replace values
with a global variable created by the Set Globals node.
Algorithm. Substitutes a value pre dicted by a model based on the C&RT algorithm. For each eld
imputed using this method, there will be a separate C&RT model, along with a Filler node that
replaces blanks and nulls with the value predicted by the model. A Filter node is then used to
remove the prediction elds generated by the m odel.
Alternatively, to coerce values for specic elds, you can us e a Type n ode to ensure that th e
eld types cover o nly legal values and then set the Check c olumn to Coerce for the elds w hose
blank values need replacing.
CLEM Functions for Missing Values
There are several fun ctions used to hand le missing value s. The following functions are often used
in Select and Filler nodes to discard or ll missing values :
count_nulls(LIST)
@BLANK(FIELD)
@NULL(FIELD)
undef