User Guide
Chapter
32
Hierarchica
l Cluster Analysis
This procedure attempts to identify relatively homogeneous groups of cases (or
variables)
based on selected characteristics, using an algorithm that starts with each
case (or variable) in a separate cluster and combines clusters until only one is left.
You can analyze raw variables or you can choose from a variety of standardizing
transform
ations. Distance or similarity measures are generated by the Proximities
procedure. Statistics are displayed at each stage to help you select the best solution.
Example. Are there identifiable groups of television shows that attract similar
audience
s within each group? With hierarchical cluster analysis, you could cluster
television shows (cases) into homogeneous groups based on viewer characteristics.
This can be used to identify segments for marketing. Or you can cluster cities
(cases) i
nto homogeneous groups so that comparable cities can be selected to test
various marketing strategies.
Statistics. Agglomeration schedule, distance (or similarity) matrix, and cluster
members
hip for a single solution or a range of solutions. Plots: dendrograms and
icicle plots.
Data. The variables can be quantitative, binary, or count data. Scaling of variables is
an impor
tant issue—differences in scaling may affect your cluster solution(s). If your
variables have large differences in scaling (for example, one variable is measured in
dollars and the other is measured in years), you should consider standardizing them
(this c
an be done automatically by the Hierarchical Cluster Analysis procedure).
Case Order. If tied distances or similarities exist in the input data or occur among
updated clusters during joining, the resulting cluster solution may depend on the order
of case
sinthefile.Youmaywanttoobtainseveral different solutions with cases
sorted in different random orders to verify the stability of a given solution.
465