User's Manual

Understanding Data Mining

Segmentation nodes

The Auto Cluster node estimates and compares clustering models, which identify

groups of records that have similar characteristics. The node wor ks in the same

manner as other automated m odeling nodes, allowin g you to experiment with multiple

combinations of options in a single modeling pass. Models can be compared using

basic measures w i t h which to attempt to ﬁlter and rank the usefulness of the cluster

models, and provide a measure based on the importance of particular ﬁelds.

The K-Means node clusters the data set into distin ct groups (or clusters) . The method

deﬁnes a ﬁxed number of clusters, iteratively assigns records to clusters, and adjusts

the cluster centers until further r eﬁnement can no longer improve the model. Instead

of trying to predict an outcome, k-means uses a process known as unsupervised

learning to uncover patterns in the set of i nput ﬁelds .

The Kohonen node generates a type of neural network that can be used to cluster the

data set into distinct groups. When the network is f ully trained, records that are

similar should be close together on the output map, while records that are different

will be far a part. You can look at the number of observat i ons captured by each unit

in the model nugget to identify the strong units. This may gi ve you a sense of the

appropriate number of clusters.

The TwoStep n ode uses a two-step clustering method. The ﬁrst step makes a single

pass through the data to compress the raw input data into a manageable set of

subclusters. The second step uses a hierarchical clustering method to progressively

merge the subclusters into larger and larger clusters. TwoStep has the advantage of

automatical l y estimating the optimal number of clusters for the training data. It can

handle mixed ﬁeld types and large data sets efﬁciently.

The Anomaly Detection node identiﬁes unusual cases, or outliers, that do not conform

to patterns of “normal” data. With this node, it is possible to identify outliers even if

they do not ﬁt any previou sly known patterns and even if you are not exactly sure

what you are lo ok ing for.

In-Database Mining Models

SPSS Modeler sup ports integration with data mining and modelin g tools that ar e available from

database ve ndors, including Ora cle Data Miner, IBM DB2 Inf oSphere Warehouse, and Microsoft

Analysis Services. You can build, score, and store models inside the database—all from within the

SPSS Modeler application. For full details, see the SPSS Modeler In-Database Mining Guide,

availabl e on the product DVD.

IBM SPSS Statistics Models

If you have a co py of IBM® SPSS® Statistics installed and licens ed on your computer, you can

access and run certain SPSS Statistics routines from within SPSS Modeler to build and score

models .

Further Information

Detailed documentation on the modeling algorithms is also available. For mor e informatio n, see

the SPSS Modeler Algorithms Guide, availab le on the produ ct DVD.