User's Manual

30
Chapter 4
Typically, you will use these facilities to identify a promising set of attributes in the data. These
attribut es ca
n then be fed to the modeling techniques, which will attempt to identify underlying
rules and relationships.
Typical Applications
Typical applications of data mining techniques include the following:
Direct mail. Determine which demographic grou ps have the highest response rate. Use this
informa tion to maximize the response to future mailings.
Credit scoring. Use an individual’s credit history to make credit decisions.
Human resources. Understand past hiring practices and create decision rules to streamline the
hiring process.
Medical research. Create decision rules that suggest appropriate procedures based on medical
evidence.
Market analysis. Determine w hich variables, such as geography, price, and customer
characteristics, are a ssociated w ith sales.
Quality control. Analyze data from product manuf acturing and identify variables determining
product defects.
Policy studies. Use survey data to formulate policy by applying decision rules to select the most
important variables.
Health care. User surveys and clinical data can be c om bined to d is cover variables that contribute
to health.
Terminology
The terms attribute, eld, and variable refer to a single data item common to al l cases under
consideration. A collection of attribute values that refers to a specic case is called a record, an
example, or a case.
Assessing the Data
Data mining is not likely to be fruitful unless the data you want to use meets certain criteria. The
followi ng sections present some of the aspects of the data and its applicatio n that you should
consider.
Ensure that the data is available
This m ay see m obvious, but be aware that although data might be available, it may not be in a
form t ha t can be used easily. IBM® SPSS® Modeler can import data from databases (through
ODBC) or from les. The data, ho wever, might be held in some other fo r m on a machine that
cannot be direc tly accessed. It will need to be downloaded or dumped in a suitable for m before it
can be used. It might be scattered among different databases and s ources and need to be pull ed