User Guide
Chapter
25
Distances
This procedure calculates any of a wide variety of statistics measuring either
similariti
es or dissimilarities (distances), either between pairs of variables or between
pairs of cases. These similarity or distance measures can then be used with other
procedures, such as factor analysis, cluster analysis, or multidimensional scaling, to
help analy
ze complex data sets.
Example. Is it possible to measure similarities between pairs of automobiles based on
certain characteristics, such as engine size, MPG, and horsepower? By computing
similari
ties between autos, you can gain a sense of which autos are similar to each
other and which are different from each other. For a more formal analysis, you might
consider applying a hierarchical cluster analysis or multidimensional scaling to the
similari
ties to explore the underlying structure.
Statistics. Dissimilarity (distance) measures for interval data are Euclidean distance,
squared Euclidean distance, Chebychev, block, Minkowski, or customized; for count
data, ch
i-square or phi-square; for binary data, Euclidean distance, squared Euclidean
distance, size difference, pattern difference, variance, shape, or Lance and Williams.
Similarity measures for interval data are Pearson correlation or cosine; for binary
data, R
ussel and Rao, simple matching, Jaccard, dice, Rogers and Tanimoto, Sokal
and Sneath 1, Sokal and Sneath 2, Sokal and Sneath 3, Kulczynski 1, Kulczynski 2,
Sokal and Sneath 4, Hamann, Lambda, Anderberg’s D,Yule’sY,Yule’sQ,Ochiai,
Sokal a
nd Sneath 5, phi 4-point correlation, or dispersion.
To Obtain Distanc e Matrices
E From the menus choose:
Analyz
e
Correl
ate
Distan
ces...
405