user manual

32
Chapter 4
A Strategy for Data Mining
As with most business endeavors, data mining is much more effective if done in a planned,
systematic way. Even with cutting-edge data mining tools, such as IBM® SPSS® Modeler, the
majorit y of the work in data mining requires a knowledge able b usiness analyst to keep the process
on trac k. To guide your pla nning, answer the following questions:
What substantive problem do you want to solve?
What dat a sources are available, a nd what parts of the data are relevant to the current pro blem?
What kind of preprocessing and data cleaning do you need to do before you start mining
the data?
What data mining technique(s) will you use?
How will you evaluate the results of the data mining analysis?
How will you get the most out of the infor mation you obtained from data mining?
The t ypical data mining process can become complicated very quickly. There is a lot to keep track
of—complex bus iness prob lems, multiple data sources, var ying data quality across data sources,
an array of data mining techniques, different ways of measuring data mining success, and so on.
To stay on track, it helps to have an expli citly dened process model for data mining. The
process model helps y ou answer the questions listed earlier in this section, and makes sur e the
important points are addressed. It serve s as a data mining road map so that you will not lose your
way as you dig into the complexities of your d ata.
The data mining process suggested for use with SPSS Modeler is the Cross-Indust r y Standard
Process for Data Mining (CRISP-DM). As you can tell from the name, this model is designed as a
general mod el that can be applied to a wide variety of industrie s and business problems.
The CRISP-DM Process Model
The ge ner a
l CRISP-DM process model includes si x phases that address the main is sues in data
mining. The six phases t together in a cyclical process design ed to incorporate data mining
into your larger business pra ctices.