User Guide

Chapter

Distances

This procedure calculates any of a wide variety of statistics measuring either

similariti

es or dissimilarities (distances), either between pairs of variables or between

pairs of cases. These similarity or distance measures can then be used with other

procedures, such as factor analysis, cluster analysis, or multidimensional scaling, to

help analy

ze complex data sets.

Example. Is it possible to measure similarities between pairs of automobiles based on

certain characteristics, such as engine size, MPG, and horsepower? By computing

similari

ties between autos, you can gain a sense of which autos are similar to each

other and which are different from each other. For a more formal analysis, you might

consider applying a hierarchical cluster analysis or multidimensional scaling to the

similari

ties to explore the underlying structure.

Statistics. Dissimilarity (distance) measures for interval data are Euclidean distance,

squared Euclidean distance, Chebychev, block, Minkowski, or customized; for count

data, ch

i-square or phi-square; for binary data, Euclidean distance, squared Euclidean

distance, size difference, pattern difference, variance, shape, or Lance and Williams.

Similarity measures for interval data are Pearson correlation or cosine; for binary

data, R

ussel and Rao, simple matching, Jaccard, dice, Rogers and Tanimoto, Sokal

and Sneath 1, Sokal and Sneath 2, Sokal and Sneath 3, Kulczynski 1, Kulczynski 2,

Sokal and Sneath 4, Hamann, Lambda, Anderberg’s D,Yule’sY,Yule’sQ,Ochiai,

Sokal a

nd Sneath 5, phi 4-point correlation, or dispersion.

To Obtain Distanc e Matrices

E From the menus choose:

Analyz

Correl

ate

Distan

ces...

405