User Guide

Chapter

Hierarchica

l Cluster Analysis

This procedure attempts to identify relatively homogeneous groups of cases (or

variables)

based on selected characteristics, using an algorithm that starts with each

case (or variable) in a separate cluster and combines clusters until only one is left.

You can analyze raw variables or you can choose from a variety of standardizing

transform

ations. Distance or similarity measures are generated by the Proximities

procedure. Statistics are displayed at each stage to help you select the best solution.

Example. Are there identifiable groups of television shows that attract similar

audience

s within each group? With hierarchical cluster analysis, you could cluster

television shows (cases) into homogeneous groups based on viewer characteristics.

This can be used to identify segments for marketing. Or you can cluster cities

(cases) i

nto homogeneous groups so that comparable cities can be selected to test

various marketing strategies.

Statistics. Agglomeration schedule, distance (or similarity) matrix, and cluster

members

hip for a single solution or a range of solutions. Plots: dendrograms and

icicle plots.

Data. The variables can be quantitative, binary, or count data. Scaling of variables is

an impor

tant issue—differences in scaling may affect your cluster solution(s). If your

variables have large differences in scaling (for example, one variable is measured in

dollars and the other is measured in years), you should consider standardizing them

(this c

an be done automatically by the Hierarchical Cluster Analysis procedure).

Case Order. If tied distances or similarities exist in the input data or occur among

updated clusters during joining, the resulting cluster solution may depend on the order

of case

sinthefile.Youmaywanttoobtainseveral different solutions with cases

sorted in different random orders to verify the stability of a given solution.

465