user manual
Chapter
4
44
4
Understanding Data Mining
Data Mining Overview
Through a variety of techniqu es, data mining identifies nuggets of information in bodies o f data.
Data mining extracts information in such a way that it can be used in areas such as decision
support, prediction, forecasts, and estimation. Data is often vol
uminous bu t of low value and with
little direct usefulness in its raw form . It is the hidden information in the data that has value.
In data mining, succe ss comes from combining your (or your expert’s) knowledge of the
data with advanced, active analysis techniques in which the compu
ter identifies the under lying
relationships and features in the data. The process of data mining generates m odels from historical
data that are later use d for predictions, pattern detection, a nd more. The technique for build ing
these models is called machine learning or modeling.
Modeling Techniques
IBM® SPSS® Modeler includes a number of machine-learn ing and modeling technologies, which
can be rough ly g r ouped acc ording to the types of problems they are intended to solve.
Predictive modeling methods include decision trees, ne ural networks, and statistical models.
Cluste r ing models focus on identifying groups of similar records and labeling the records
according to t he group to which they belong. Clustering meth ods include Kohonen, k-means,
and TwoSte p.
Associa tion rules associate a particular conclusion (such as the purchase of a particular
product) with a set of c onditions (the purchase of several other products).
Screening models can be used to scr een data to locate fields and records that are most likely to
be of interest in modeling and identify outliers that may not fit known patterns. Available
methods include feature selection and anomaly detection.
Data Manipulation and Discovery
SPSS Modeler also includes many fa cilities that let y ou apply your expertise to the data:
Data manipulation. Constructs new dat a items d erived from exi sting ones and breaks down the
data into meaningful subsets. Data from a variety of sources can be merged and filtered.
Browsing and visualization. Disp lays aspect s of the da ta using the Data Audit node to perform
an initi al audit including graphs and statistics. Adva nced visualization includes interactive
graphics, which can be exported for inclus ion in project reports.
Statistics. Confirms suspected relationships between variabl es in the data. Statistics from
IBM® SPSS® Statistics can also be used within SPSS Modeler.
Hypothesis testing. Construc ts models of how the data be haves and verifies these models.
© Copyright IBM Corporation 1994, 2012.
29