User Guide
457
Two S t e p C l u s t e
rAnalysis
Determine automatically. The procedure will automatically determine the “best”
number of cl
usters, using the criterion specified in the Clustering Criterion group.
Optionally, enter a positive integer specifying the maximum number of clusters
that the procedure should consider.
Specify fixed. Allows you to fix the number of clusters in the solution. Enter a
positive in
teger.
Count of Continuous Variables. This group provides a summary of the continuous
variable standardization specifications made in the Options dialog box. For more
informati
on, see “TwoStep Cluster Analysis Options” on p. 459.
Clustering Criterion. This selection determines how the automatic clustering algorithm
determines the number of clusters. Either the Bayesian Information Criterion (BIC)
or the Akai
ke Information Criterion (AIC) can be specified.
Data. This procedure works with both continuous and categorical variables. Cases
represent objects to be clustered, and the variables represent attributes upon which
the clust
ering is based.
Case Order. Note that the cluster features tree and the final solution may depend on
the order of cases. To minimize order effects, randomly order the cases. You may
want to ob
tain several different solutions with cases sorted in different random orders
to verify the stability of a given solution. In situations where this is difficult due to
extremely large file sizes, multiple runs with a sample of cases sorted in different
random o
rders might be substituted.
Assumptions. The likelihood distance measure assumes that variables in the cluster
model are independent. Further, each continuous variable is assumed to have a
normal (
Gaussian) distribution, and each categorical variable is assumed to have a
multinomial distribution. Empirical internal testing indicates that the procedure
is fairly robust to violations of both the assumption of independence and the
distri
butional assumptions, but you should try to be aware of how well these
assumptions are met.
Use the Bivariate Correlations procedure to test the independence of two
contin
uous variables. Use the Crosstabs procedure to test the independence of two
categorical variables. Use the Means procedure to test the independence between a
continuous variable and categorical variable. Use the Explore procedure to test the
norma
lity of a continuous variable. Use the Chi-Square Test procedure to test whether
a categorical variable has a specified multinomial distribution.