User Guide

ManualsBrandsSPSS ManualsOtherSPSS BASE USERS GUIDE 13.0

491

492

493

494

495

496

497

498

499

500

Chapter

K-Means Clus

ter Analysis

This procedure attempts to identify relatively homogeneous groups of cases based

on selected

characteristics, using an algorithm that can handle large numbers of

cases. However, the algorithm requires you to specify the number of clusters. You

can specify initial cluster centers if you know this information. You can select one

of two meth

ods for classifying cases, either updating cluster centers iteratively or

classifying only. You can save cluster membership, distance information, and final

cluster centers. Optionally, you can specify a variable whose values are used to label

casewise

output. You can also request analysis of variance F statistics. While these

statistics are opportunistic (the procedure tries to form groups that do differ), the

relative size of the statistics provides information about each variable’s contribution

to the se

paration of the groups.

Example. What are some identifiable groups of television shows that attract similar

audiences within each group? With k-means cluster analysis, you could cluster

televis

ion shows (cases) into k homogeneous groups based on viewer characteristics.

This can be used to identify segments for marketing. Or you can cluster cities

(cases) into homogeneous groups so that comparable cities can be selected to test

variou

s marketing strategies.

Statistics. Complete solution: initial cluster centers, ANOVA table. Each case: cluster

information, distance from cluster center.

Data. V

ariables should be quantitative at the interval or ratio level. If your variables

are binary or counts, use the Hierarchical Cluster Analysis procedure.

Case and Initial Cluster Center Order. The default algorithm for choosing initial cluster

center

s is not invariant to case ordering. The

Use running means option on the

Iterate dialog box makes the resulting solution potentially dependent upon case

order regardless of how initial cluster centers are chosen. If you are using either of

these

methods, you may want to obtain several different solutions with cases sorted

in different random orders to verify the stability of a given solution. Specifying

473