User's Manual

33
Understanding Data Mining
Figure 4-1
CRISP-DM process model
The six phases inclu de:
Business understanding. This is perhaps the most important phase of data mining. Business
understanding includes determining business objectives, assessing the situation, determining
data mining goals, and producing a project plan.
Data understanding. D ata provides the “r aw materials” of data mining. This phase addresses
the need to understan d what yo ur data resources are and the chara cteristics of those resources.
It includes collect ing initial data, describing data, exploring data, a nd veri f ying data quality.
The Data Audit node a vailable from the Output nodes palette is an indispensable tool for
data understanding.
Data preparation. After cataloging your data resour ces , you will need to prepare your data for
mining. Preparations include selecting, cleaning, constructing, integrating, a nd formatting
data.
Modeling. This is, of course, the ashy part of data m ining, where sophis ticated analysis
methods are used to extract infor mation from the d ata. This phase involves selecting modeling
techniques, generating test designs, and building and assessing models.
Evaluation. Once you have chosen your models, you are ready to evaluate how th e data mining
results can help you to achieve your business objectives. Elements of this phase include
evaluati ng results, reviewing the data mining process, and determining the next steps.
Deployment. Now that you h ave inves ted all of this effort, it is time to reap the benets. This
phase focuses on integrating your new knowledg e into your e veryday business processes to
solve your original b usiness problem. This phase includes plan deployment, monitoring and
maintenance, producing a nal report, and reviewing the project.
There ar e some key points in this process model. First, while there is a gen eral tendency for the
process to ow through the steps in the order outlined in the previous para graphs, there are also a
number of places where the phases inuence each other in a nonlinear way. For example, data
preparation usually precede s modeling. However, decisions made and information gathered
during the modeling phase can often lead you to rethink parts of the data preparation phase, which
can then present new modeling issues. The two phases fee d back on each other until both phases