User Guide

Chapter

Explore

The Explore procedure produces summary statistics and graphical displays, either for

all of your cases or separately for groups of cases. There are many reasons for using

the Explore procedure—data screening, outlier identification, description, assumption

checking, and characterizing differences among subpopulations (groups of cases).

Data screening may show that you have unusual values, extreme values, gaps in the

data, or other peculiarities. Exploring the data can help to determine whether the

statistical techniques that you are considering for data analysis are appropriate. The

exploration may indicate that you need to transform the data if the technique requires

a normal distribution. Or, you may decide that you need nonparametric tests.

Example. Look at the distribution of maze-learning times for rats under four different

reinforcement schedules. For each of the four groups, you can see if the distribution

of times is approximately normal and whether the four variances are equal. You can

also identify the cases with the five largest and five smallest times. The boxplots

and stem-and-leaf plots graphically summarize the distribution of learning times

for each of the groups.

Statistics and plots. Mean, median, 5% trimmed mean, standard error, variance,

standard deviation, minimum, maximum, range, interquartile range, skewness and

kurtosis and their standard errors, confidence interval for the mean (and specified

confidence level), percentiles, Huber’s M-estimator, Andrews’ wave estimator,

Hampel’s redescending M-estimator, Tukey’s biweight estimator, the five largest and

five smallest values, the Kolmogorov-Smirnov statistic with a Lilliefors significance

level for testing normality, and the Shapiro-Wilk statistic. Boxplots, stem-and-leaf

plots, histograms, normality plots, and spread-versus-level plots with Levene tests

and transformations.

Data. The Explore procedure can be used for quantitative variables (interval- or

ratio-level measurements). A factor variable (used to break the data into groups of

cases) should have a reasonable number of distinct values (categories). These values

319