user manual

Understanding Data Mining

The Self-Learning Resp onse Model (SLRM) node enables you to build a model in

which a single new case, or small number of new cases, can be used to reestimate the

model without having to retrain the model using all data.

The Time Series node estimates exponential smoot hing, univariate Autor egressive

Integrated Moving Average (ARIMA), and multivariate ARIMA (or transfer function)

models for time series data a nd produces forecasts of future performance. A Time

Series node must always b e preceded by a Time Intervals node.

The k-Nearest Neighbor (KNN) node associates a new case with the category or value

of the k objects nearest to it in the predictor space, where k is an integer. Similar cases

are near each other and dissimilar cases are distant from each other.

Association Models

Association models ﬁnd patterns in your data where one or more entities (such as even ts ,

purchases, or attributes) a r e associated with one or more other entities. The models construct rule

sets that deﬁne these relationsh ips. Here the ﬁelds within the data can act as b oth inputs and

targets. You co uld ﬁnd these associations manua lly, but association ru le algorithms do so much

more quickly, and can explore more complex patterns. Apriori and Carma models are examples of

the use of such algorithms. One other type o f association mod el is a sequence detection model,

which ﬁnds sequential patterns in time-st r uctured data.

Associa tion models are most useful when predicting multiple outcomes—for example , customers

who bought pr oduct X also bought Y and Z. Associatio n models associa te a particular conclusion

(such as the de cision to buy something) with a set of conditions. The advantage of ass ociation rule

algorithms over the more standard decision tree algorithms (C5.0 and C&RT) is that associa tions

can exist between any of the attributes. A decision tree alg orithm will build r ules with on ly a

single conclusion, whereas association alg orithms attempt to ﬁnd many rules, each of which may

have a different conclusion.

Association nodes

The Apriori node extracts a set of rules from the data, pulling out the rules with

the hig hest information content. Apriori offers ﬁve different methods of selecting

rules and uses a sophisticated indexing s cheme to process large data set s efﬁciently.

For large problems, Apriori is generally faster to train; it has no arb i trary limit on

the number of rules th at can be retained, and it can handle rules with up to 32