user manual

Chapter 4

preconditions. Apriori requires that input and output ﬁelds all be categorical but

delivers better performance because it is opti mized for this type of data.

The CA RMA model extracts a set of rule s from the data w i t hout requiring you to

specify input or target ﬁelds. In contrast to Apriori the CARMA node offers build

settings for rule support (support for both antecedent and consequent) rathe r than just

antecedent support. This means that the rules generated can be use d for a wider variety

of ap plications—for example, to ﬁnd a list of products or services (antecede nts)

whose consequent is the item that you want to promote this holiday season.

The Sequence node discovers association rules in sequential or time-oriented data. A

sequence is a list of item sets that tends to occur in a predictable order. For example, a

customer who purchases a razor and aftershave lotion may purchase shaving cream

the next time he shops. The Sequence node is based on the CARMA association rules

algorithm, which uses an efﬁcient two-pass method for ﬁnding sequences.

Segmentation Models

Segmentation models divide the data into segments, or clusters, of records that have similar

patterns of input ﬁelds. As they are only interes ted in the input ﬁelds, segmentation models have

no concept of output or target ﬁelds. Examples of segmentation models are Kohonen networks,

K-Mea ns clustering, two-step clustering and anomaly detection.

Segmentation models (also known as “clustering models”) a r e useful in ca ses where the speciﬁc

result is unknown (for example, when identifying new patterns of fraud, or when identifying

groups of interest in your custom er bas e). Clusterin g models focus on identifying groups of

similar records and labeling the records according to the group to which they belong. This is

done without the beneﬁt of p r ior k nowledge about the gr oups and their characteristics, and it

distingu is hes clustering models fr om the other m odeling techniques in that there is no predeﬁned

output or target ﬁeld for the model to predict. There are no right or wrong answers for these

models . Their value is determined by thei r ability to capture inte r esting groupings in the data an d

provide usef ul descriptions of those gr oupings. Clustering models are ofte n used to create clusters

or segments that are then used as inputs in subsequen t analyses (f or exa mple, by segmenting

potential customers into homogeneous subgroups).