user manual

Chapter

Understanding Data Mining

Data Mining Overview

Through a variety of techniqu es, data mining identiﬁes nuggets of information in bodies o f data.

Data mining extracts information in such a way that it can be used in areas such as decision

support, prediction, forecasts, and estimation. Data is often vol

uminous bu t of low value and with

little direct usefulness in its raw form . It is the hidden information in the data that has value.

In data mining, succe ss comes from combining your (or your expert’s) knowledge of the

data with advanced, active analysis techniques in which the compu

ter identiﬁes the under lying

relationships and features in the data. The process of data mining generates m odels from historical

data that are later use d for predictions, pattern detection, a nd more. The technique for build ing

these models is called machine learning or modeling.

Modeling Techniques

IBM® SPSS® Modeler includes a number of machine-learn ing and modeling technologies, which

can be rough ly g r ouped acc ording to the types of problems they are intended to solve.

 Predictive modeling methods include decision trees, ne ural networks, and statistical models.

 Cluste r ing models focus on identifying groups of similar records and labeling the records

according to t he group to which they belong. Clustering meth ods include Kohonen, k-means,

and TwoSte p.

 Associa tion rules associate a particular conclusion (such as the purchase of a particular

product) with a set of c onditions (the purchase of several other products).

 Screening models can be used to scr een data to locate ﬁelds and records that are most likely to

be of interest in modeling and identify outliers that may not ﬁt known patterns. Available

methods include feature selection and anomaly detection.

Data Manipulation and Discovery

SPSS Modeler also includes many fa cilities that let y ou apply your expertise to the data:

 Data manipulation. Constructs new dat a items d erived from exi sting ones and breaks down the

data into meaningful subsets. Data from a variety of sources can be merged and ﬁltered.

 Browsing and visualization. Disp lays aspect s of the da ta using the Data Audit node to perform

an initi al audit including graphs and statistics. Adva nced visualization includes interactive

graphics, which can be exported for inclus ion in project reports.

 Statistics. Conﬁrms suspected relationships between variabl es in the data. Statistics from

IBM® SPSS® Statistics can also be used within SPSS Modeler.

 Hypothesis testing. Construc ts models of how the data be haves and veriﬁes these models.