i IBM SPSS Complex Samples 19
Note: Before using this information and the product it supports, read the general information under Notices on p. 267. This document contains proprietary information of SPSS Inc, an IBM Company. It is provided under a license agreement and is protected by copyright law. The information contained in this publication does not include any product warranties, and any statements provided in this manual should not be interpreted as such.
Preface IBM® SPSS® Statistics is a comprehensive system for analyzing data. The Complex Samples optional add-on module provides the additional analytic techniques described in this manual. The Complex Samples add-on module must be used with the SPSS Statistics Core system and is completely integrated into that system. About SPSS Inc., an IBM Company SPSS Inc., an IBM Company, is a leading global provider of predictive analytic software and solutions.
Additional Publications The SPSS Statistics: Guide to Data Analysis, SPSS Statistics: Statistical Procedures Companion, and SPSS Statistics: Advanced Statistical Procedures Companion, written by Marija Norušis and published by Prentice Hall, are available as suggested supplemental material. These publications cover statistical procedures in the SPSS Statistics Base module, Advanced Statistics module and Regression module.
Contents Part I: User’s Guide 1 Introduction to Complex Samples Procedures 1 Properties of Complex Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Usage of Complex Samples Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Plan Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Further Readings . . . . . . . . .
Analysis Preparation Wizard: Estimation Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Analysis Preparation Wizard: Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Define Unequal Sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Analysis Preparation Wizard: Plan Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9 Complex Samples General Linear Model 45 Complex Samples General Linear Model Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Complex Samples Hypothesis Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Complex Samples General Linear Model Estimated Means . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Complex Samples General Linear Model Save . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Define Time-Dependent Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Preparing for Analysis When Sampling Weights Are Not in the Data File. . . . . . . . . . . . . . . . . . . 143 Computing Inclusion Probabilities and Sampling Weights Using the Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
18 Complex Samples Ratios 171 Using Complex Samples Ratios to Aid Property Value Assessment . . . . . . . . . . . . . . . . . . . . . . . 171 Running the Analysis . Ratios . . . . . . . . . . . . Pivoted Ratios Table . Summary . . . . . . . . . . Related Procedures . . . . . ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ..
Parameter Estimates . . . . . . . . . . . . . Classification. . . . . . . . . . . . . . . . . . . Odds Ratios . . . . . . . . . . . . . . . . . . . . Generalized Cumulative Model . . . . . Dropping Non-Significant Predictors . Warnings. . . . . . . . . . . . . . . . . . . . . . Comparing Models . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . Related Procedures . . . . . . . . . . . . . . . . . ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Appendices A Sample Files 258 B Notices 267 Bibliography 269 Index 271 xii
Part I: User’s Guide
Chapter Introduction to Complex Samples Procedures 1 An inherent assumption of analytical procedures in traditional software packages is that the observations in a data file represent a simple random sample from the population of interest. This assumption is untenable for an increasing number of companies and researchers who find it both cost-effective and convenient to obtain samples in a more structured way.
2 Chapter 1 Unequal selection probabilities. When sampling clusters that contain unequal numbers of units, you can use probability-proportional-to-size (PPS) sampling to make a cluster’s selection probability equal to the proportion of units it contains. PPS sampling can also use more general weighting schemes to select units. Unrestricted sampling. Unrestricted sampling selects units with replacement (WR). Thus, an individual unit can be selected for the sample more than once. Sampling weights.
3 Introduction to Complex Samples Procedures An analyst who doesn’t have access to the sampling plan file can specify an analysis plan and refer to that plan from each Complex Samples analysis procedure. A designer of large-scale public use samples can publish the sampling plan file, which simplifies the instructions for analysts and avoids the need for each analyst to specify his or her own analysis plans.
Chapter Sampling from a Complex Design 2 Figure 2-1 Sampling Wizard, Welcome step The Sampling Wizard guides you through the steps for creating, modifying, or executing a sampling plan file. Before using the Wizard, you should have a well-defined target population, a list of sampling units, and an appropriate sample design in mind. Creating a New Sample Plan E From the menus choose: Analyze > Complex Samples > Select a Sample...
5 Sampling from a Complex Design E Click Next to continue through the Wizard. E Optionally, in the Design Variables step, you can define strata, clusters, and input sample weights. After you define these, click Next. E Optionally, in the Sampling Method step, you can choose a method for selecting items. If you select PPS Brewer or PPS Murthy, you can click Finish to draw the sample. Otherwise, click Next and then: E In the Sample Size step, specify the number or proportion of units to sample.
6 Chapter 2 Sampling Wizard: Design Variables Figure 2-2 Sampling Wizard, Design Variables step This step allows you to select stratification and clustering variables and to define input sample weights. You can also specify a label for the stage. Stratify By. The cross-classification of stratification variables defines distinct subpopulations, or strata. Separate samples are obtained for each stratum.
7 Sampling from a Complex Design Input Sample Weight. If the current sample design is part of a larger sample design, you may have sample weights from a previous stage of the larger design. You can specify a numeric variable containing these weights in the first stage of the current design. Sample weights are computed automatically for subsequent stages of the current design. Stage Label. You can specify an optional string label for each stage.
8 Chapter 2 Sampling Wizard: Sampling Method Figure 2-3 Sampling Wizard, Sampling Method step This step allows you to specify how to select cases from the active dataset. Method. Controls in this group are used to choose a selection method. Some sampling types allow you to choose whether to sample with replacement (WR) or without replacement (WOR). See the type descriptions for more information.
9 Sampling from a Complex Design PPS Systematic. This is a first-stage method that systematically selects units with probability proportional to size. They are selected without replacement. PPS Sequential. This is a first-stage method that sequentially selects units with probability proportional to cluster size and without replacement. PPS Brewer. This is a first-stage method that selects two clusters from each stratum with probability proportional to cluster size and without replacement.
10 Chapter 2 Sampling Wizard: Sample Size Figure 2-4 Sampling Wizard, Sample Size step This step allows you to specify the number or proportion of units to sample within the current stage. The sample size can be fixed or it can vary across strata. For the purpose of specifying sample size, clusters chosen in previous stages can be used to define strata. Units. You can specify an exact sample size or a proportion of units to sample. Value. A single value is applied to all strata.
11 Sampling from a Complex Design Define Unequal Sizes Figure 2-5 Define Unequal Sizes dialog box The Define Unequal Sizes dialog box allows you to enter sizes on a per-stratum basis. Size Specifications grid. The grid displays the cross-classifications of up to five strata or cluster variables—one stratum/cluster combination per row. Eligible grid variables include all stratification variables from the current and previous stages and all cluster variables from previous stages.
12 Chapter 2 Sampling Wizard: Output Variables Figure 2-6 Sampling Wizard, Output Variables step This step allows you to choose variables to save when the sample is drawn. Population size. The estimated number of units in the population for a given stage. The rootname for the saved variable is PopulationSize_. Sample proportion. The sampling rate at a given stage. The rootname for the saved variable is SamplingRate_. Sample size. The number of units drawn at a given stage.
13 Sampling from a Complex Design Index. Identifies units selected multiple times within a given stage. The rootname for the saved variable is Index_. Note: Saved variable rootnames include an integer suffix that reflects the stage number—for example, PopulationSize_1_ for the saved population size for stage 1.
14 Chapter 2 Sampling Wizard: Draw Sample Selection Options Figure 2-8 Sampling Wizard, Draw Sample Selection Options step This step allows you to choose whether to draw a sample. You can also control other sampling options, such as the random seed and missing-value handling. Draw sample. In addition to choosing whether to draw a sample, you can also choose to execute part of the sampling design. Stages must be drawn in order—that is, stage 2 cannot be drawn unless stage 1 is also drawn.
15 Sampling from a Complex Design Sampling Wizard: Draw Sample Output Files Figure 2-9 Sampling Wizard, Draw Sample Output Files step This step allows you to choose where to direct sampled cases, weight variables, joint probabilities, and case selection rules. Sample data. These options let you determine where sample output is written. It can be added to the active dataset, written to a new dataset, or saved to an external IBM® SPSS® Statistics data file.
16 Chapter 2 Sampling Wizard: Finish Figure 2-10 Sampling Wizard, Finish step This is the final step. You can save the plan file and draw the sample now or paste your selections into a syntax window. When making changes to stages in the existing plan file, you can save the edited plan to a new file or overwrite the existing file. When adding stages without making changes to existing stages, the Wizard automatically overwrites the existing plan file.
17 Sampling from a Complex Design E Review the sampling plan in the Plan Summary step, and then click Next. Subsequent steps are largely the same as for a new design. See the Help for individual steps for more information. E Navigate to the Finish step, and specify a new name for the edited plan file or choose to overwrite the existing plan file. Optionally, you can: Specify stages that have already been sampled. Remove stages from the plan.
18 Chapter 2 Remove stages. You can remove stages 2 and 3 from a multistage design. Running an Existing Sample Plan E From the menus choose: Analyze > Complex Samples > Select a Sample... E Select Draw a sample and choose a plan file to run. E Click Next to continue through the Wizard. E Review the sampling plan in the Plan Summary step, and then click Next. E The individual steps containing stage information are skipped when executing a sample plan. You can now go on to the Finish step at any time.
Chapter Preparing a Complex Sample for Analysis 3 Figure 3-1 Analysis Preparation Wizard, Welcome step The Analysis Preparation Wizard guides you through the steps for creating or modifying an analysis plan for use with the various Complex Samples analysis procedures. Before using the Wizard, you should have a sample drawn according to a complex design.
20 Chapter 3 Creating a New Analysis Plan E From the menus choose: Analyze > Complex Samples > Prepare for Analysis... E Select Create a plan file, and choose a plan filename to which you will save the analysis plan. E Click Next to continue through the Wizard. E Specify the variable containing sample weights in the Design Variables step, optionally defining strata and clusters. E You can now click Finish to save the plan.
21 Preparing a Complex Sample for Analysis This step allows you to identify the stratification and clustering variables and define sample weights. You can also provide a label for the stage. Strata. The cross-classification of stratification variables defines distinct subpopulations, or strata. Your total sample represents the combination of independent samples from each stratum. Clusters. Cluster variables define groups of observational units, or clusters.
22 Chapter 3 Analysis Preparation Wizard: Estimation Method Figure 3-3 Analysis Preparation Wizard, Estimation Method step This step allows you to specify an estimation method for the stage. WR (sampling with replacement). WR estimation does not include a correction for sampling from a finite population (FPC) when estimating the variance under the complex sampling design. You can choose to include or exclude the FPC when estimating the variance under simple random sampling (SRS).
23 Preparing a Complex Sample for Analysis Analysis Preparation Wizard: Size Figure 3-4 Analysis Preparation Wizard, Size step This step is used to specify inclusion probabilities or population sizes for the current stage. Sizes can be fixed or can vary across strata. For the purpose of specifying sizes, clusters specified in previous stages can be used to define strata. Note that this step is necessary only when Equal WOR is chosen as the Estimation Method. Units.
24 Chapter 3 Define Unequal Sizes Figure 3-5 Define Unequal Sizes dialog box The Define Unequal Sizes dialog box allows you to enter sizes on a per-stratum basis. Size Specifications grid. The grid displays the cross-classifications of up to five strata or cluster variables—one stratum/cluster combination per row. Eligible grid variables include all stratification variables from the current and previous stages and all cluster variables from previous stages.
25 Preparing a Complex Sample for Analysis Analysis Preparation Wizard: Plan Summary Figure 3-6 Analysis Preparation Wizard, Plan Summary step This is the last step within each stage, providing a summary of the analysis design specifications through the current stage. From here, you can either proceed to the next stage (creating it if necessary) or save the analysis specifications. If you cannot add another stage, it is likely because: No cluster variable was specified in the Design Variables step.
26 Chapter 3 Analysis Preparation Wizard: Finish Figure 3-7 Analysis Preparation Wizard, Finish step This is the final step. You can save the plan file now or paste your selections to a syntax window. When making changes to stages in the existing plan file, you can save the edited plan to a new file or overwrite the existing file. When adding stages without making changes to existing stages, the Wizard automatically overwrites the existing plan file.
27 Preparing a Complex Sample for Analysis E Review the analysis plan in the Plan Summary step, and then click Next. Subsequent steps are largely the same as for a new design. For more information, see the Help for individual steps. E Navigate to the Finish step, and specify a new name for the edited plan file, or choose to overwrite the existing plan file. Optionally, you can remove stages from the plan.
Chapter Complex Samples Plan 4 Complex Samples analysis procedures require analysis specifications from an analysis or sample plan file in order to provide valid results. Figure 4-1 Complex Samples Plan dialog box Plan. Specify the path of an analysis or sample plan file. Joint Probabilities. In order to use Unequal WOR estimation for clusters drawn using a PPS WOR method, you need to specify a separate file or an open dataset containing the joint probabilities.
Chapter Complex Samples Frequencies 5 The Complex Samples Frequencies procedure produces frequency tables for selected variables and displays univariate statistics. Optionally, you can request statistics by subgroups, defined by one or more categorical variables. Example. Using the Complex Samples Frequencies procedure, you can obtain univariate tabular statistics for vitamin usage among U.S.
30 Chapter 5 Figure 5-1 Frequencies dialog box E Select at least one frequency variable. Optionally, you can specify variables to define subpopulations. Statistics are computed separately for each subpopulation. Complex Samples Frequencies Statistics Figure 5-2 Frequencies Statistics dialog box Cells. This group allows you to request estimates of the cell population sizes and table percentages. Statistics. This group produces statistics associated with the population size or table percentage.
31 Complex Samples Frequencies Confidence interval. A confidence interval for the estimate, using the specified level. Coefficient of variation. The ratio of the standard error of the estimate to the estimate. Unweighted count. The number of units used to compute the estimate. Design effect. The ratio of the variance of the estimate to the variance obtained by assuming that the sample is a simple random sample.
32 Chapter 5 Complex Samples Options Figure 5-4 Options dialog box Subpopulation Display. You can choose to have subpopulations displayed in the same table or in separate tables.
Chapter Complex Samples Descriptives 6 The Complex Samples Descriptives procedure displays univariate summary statistics for several variables. Optionally, you can request statistics by subgroups, defined by one or more categorical variables. Example. Using the Complex Samples Descriptives procedure, you can obtain univariate descriptive statistics for the activity levels of U.S.
34 Chapter 6 Figure 6-1 Descriptives dialog box E Select at least one measure variable. Optionally, you can specify variables to define subpopulations. Statistics are computed separately for each subpopulation.
35 Complex Samples Descriptives Summaries. This group allows you to request estimates of the means and sums of the measure variables. Additionally, you can request t tests of the estimates against a specified value. Statistics. This group produces statistics associated with the mean or sum. Standard error. The standard error of the estimate. Confidence interval. A confidence interval for the estimate, using the specified level. Coefficient of variation.
36 Chapter 6 Complex Samples Options Figure 6-4 Options dialog box Subpopulation Display. You can choose to have subpopulations displayed in the same table or in separate tables.
Chapter Complex Samples Crosstabs 7 The Complex Samples Crosstabs procedure produces crosstabulation tables for pairs of selected variables and displays two-way statistics. Optionally, you can request statistics by subgroups, defined by one or more categorical variables. Example. Using the Complex Samples Crosstabs procedure, you can obtain cross-classification statistics for smoking frequency by vitamin usage of U.S.
38 Chapter 7 Figure 7-1 Crosstabs dialog box E Select at least one row variable and one column variable. Optionally, you can specify variables to define subpopulations. Statistics are computed separately for each subpopulation.
39 Complex Samples Crosstabs Complex Samples Crosstabs Statistics Figure 7-2 Crosstabs Statistics dialog box Cells. This group allows you to request estimates of the cell population size and row, column, and table percentages. Statistics. This group produces statistics associated with the population size and row, column, and table percentages. Standard error. The standard error of the estimate. Confidence interval. A confidence interval for the estimate, using the specified level.
40 Chapter 7 Summaries for 2-by-2 Tables. This group produces statistics for tables in which the row and column variable each have two categories. Each is a measure of the strength of the association between the presence of a factor and the occurrence of an event. Odds ratio. The odds ratio can be used as an estimate of relative risk when the occurrence of the factor is rare. Relative risk.
41 Complex Samples Crosstabs Complex Samples Options Figure 7-4 Options dialog box Subpopulation Display. You can choose to have subpopulations displayed in the same table or in separate tables.
Chapter Complex Samples Ratios 8 The Complex Samples Ratios procedure displays univariate summary statistics for ratios of variables. Optionally, you can request statistics by subgroups, defined by one or more categorical variables. Example.
43 Complex Samples Ratios Figure 8-1 Ratios dialog box E Select at least one numerator variable and denominator variable. Optionally, you can specify variables to define subgroups for which statistics are produced. Complex Samples Ratios Statistics Figure 8-2 Ratios Statistics dialog box Statistics. This group produces statistics associated with the ratio estimate. Standard error. The standard error of the estimate. Confidence interval.
44 Chapter 8 Design effect. The ratio of the variance of the estimate to the variance obtained by assuming that the sample is a simple random sample. This is a measure of the effect of specifying a complex design, where values further from 1 indicate greater effects. Square root of design effect. This is a measure of the effect of specifying a complex design, where values further from 1 indicate greater effects. T test. You can request t tests of the estimates against a specified value.
Chapter Complex Samples General Linear Model 9 The Complex Samples General Linear Model (CSGLM) procedure performs linear regression analysis, as well as analysis of variance and covariance, for samples drawn by complex sampling methods. Optionally, you can request analyses for a subpopulation. Example. A grocery store chain surveyed a set of customers concerning their purchasing habits, according to a complex design.
46 Chapter 9 Figure 9-1 General Linear Model dialog box E Select a dependent variable. Optionally, you can: Select variables for factors and covariates, as appropriate for your data. Specify a variable to define a subpopulation. The analysis is performed only for the selected category of the subpopulation variable.
47 Complex Samples General Linear Model Figure 9-2 Model dialog box Specify Model Effects. By default, the procedure builds a main-effects model using the factors and covariates specified in the main dialog box. Alternatively, you can build a custom model that includes interaction effects and nested terms. Non-Nested Terms For the selected factors and covariates: Interaction. Creates the highest-level interaction term for all selected variables. Main effects.
48 Chapter 9 Nested Terms You can build nested terms for your model in this procedure. Nested terms are useful for modeling the effect of a factor or covariate whose values do not interact with the levels of another factor. For example, a grocery store chain may follow the spending habits of its customers at several store locations. Since each customer frequents only one of these locations, the Customer effect can be said to be nested within the Store location effect.
49 Complex Samples General Linear Model Covariances of parameter estimates. Displays an estimate of the covariance matrix for the model coefficients. Correlations of parameter estimates. Displays an estimate of the correlation matrix for the model coefficients. Design effect. The ratio of the variance of the estimate to the variance obtained by assuming that the sample is a simple random sample.
50 Chapter 9 first stage of sampling. Alternatively, you can set a custom degrees of freedom by specifying a positive integer. Adjustment for Multiple Comparisons. When performing hypothesis tests with multiple contrasts, the overall significance level can be adjusted from the significance levels for the included contrasts. This group allows you to choose the adjustment method. Least significant difference.
51 Complex Samples General Linear Model The Estimated Means dialog box allows you to display the model-estimated marginal means for levels of factors and factor interactions specified in the Model subdialog box. You can also request that the overall population mean be displayed. Term. Estimated means are computed for the selected factors and factor interactions. Contrast. The contrast determines how hypothesis tests are set up to compare the estimated means. Simple.
52 Chapter 9 Export model as SPSS Statistics data. Writes a dataset in IBM® SPSS® Statistics format containing the parameter correlation or covariance matrix with parameter estimates, standard errors, significance values, and degrees of freedom. The order of variables in the matrix file is as follows. rowtype_. Takes values (and value labels), COV (Covariances), CORR (Correlations), EST (Parameter estimates), SE (Standard errors), SIG (Significance levels), and DF (Sampling design degrees of freedom).
53 Complex Samples General Linear Model CSGLM Command Additional Features The command syntax language also allows you to: Specify custom tests of effects versus a linear combination of effects or a value (using the CUSTOM subcommand). Fix covariates at values other than their means when computing estimated marginal means (using the EMMEANS subcommand). Specify a metric for polynomial contrasts (using the EMMEANS subcommand).
Chapter 10 Complex Samples Logistic Regression The Complex Samples Logistic Regression procedure performs logistic regression analysis on a binary or multinomial dependent variable for samples drawn by complex sampling methods. Optionally, you can request analyses for a subpopulation. Example. A loan officer has collected past records of customers given loans at several different branches, according to a complex design.
55 Complex Samples Logistic Regression Figure 10-1 Logistic Regression dialog box E Select a dependent variable. Optionally, you can: Select variables for factors and covariates, as appropriate for your data. Specify a variable to define a subpopulation. The analysis is performed only for the selected category of the subpopulation variable.
56 Chapter 10 By default, the Complex Samples Logistic Regression procedure makes the highest-valued category the reference category. This dialog box allows you to specify the highest value, the lowest value, or a custom category as the reference category. Complex Samples Logistic Regression Model Figure 10-3 Logistic Regression Model dialog box Specify Model Effects. By default, the procedure builds a main-effects model using the factors and covariates specified in the main dialog box.
57 Complex Samples Logistic Regression All 3-way. Creates all possible three-way interactions of the selected variables. All 4-way. Creates all possible four-way interactions of the selected variables. All 5-way. Creates all possible five-way interactions of the selected variables. Nested Terms You can build nested terms for your model in this procedure. Nested terms are useful for modeling the effect of a factor or covariate whose values do not interact with the levels of another factor.
58 Chapter 10 Pseudo R-square. The R2 statistic from linear regression does not have an exact counterpart among logistic regression models. There are, instead, multiple measures that attempt to mimic the properties of the R2 statistic. Classification table. Displays the tabulated cross-classifications of the observed category by the model-predicted category on the dependent variable. Parameters. This group allows you to control the display of statistics related to the model parameters. Estimate.
59 Complex Samples Logistic Regression Complex Samples Hypothesis Tests Figure 10-5 Hypothesis Tests dialog box Test Statistic. This group allows you to select the type of statistic used for testing hypotheses. You can choose between F, adjusted F, chi-square, and adjusted chi-square. Sampling Degrees of Freedom. This group gives you control over the sampling design degrees of freedom used to compute p values for all test statistics.
60 Chapter 10 Complex Samples Logistic Regression Odds Ratios Figure 10-6 Logistic Regression Odds Ratios dialog box The Odds Ratios dialog box allows you to display the model-estimated odds ratios for specified factors and covariates. A separate set of odds ratios is computed for each category of the dependent variable except the reference category. Factors. For each selected factor, displays the ratio of the odds at each category of the factor to the odds at the specified reference category.
61 Complex Samples Logistic Regression Complex Samples Logistic Regression Save Figure 10-7 Logistic Regression Save dialog box Save Variables. This group allows you to save the model-predicted category and predicted probabilities as new variables in the active dataset. Export model as SPSS Statistics data. Writes a dataset in IBM® SPSS® Statistics format containing the parameter correlation or covariance matrix with parameter estimates, standard errors, significance values, and degrees of freedom.
62 Chapter 10 Complex Samples Logistic Regression Options Figure 10-8 Logistic Regression Options dialog box Estimation. This group gives you control of various criteria used in the model estimation. Maximum Iterations. The maximum number of iterations the algorithm will execute. Specify a non-negative integer. Maximum Step-Halving. At each iteration, the step size is reduced by a factor of 0.5 until the log-likelihood increases or maximum step-halving is reached. Specify a positive integer.
63 Complex Samples Logistic Regression Confidence Interval. This is the confidence interval level for coefficient estimates, exponentiated coefficient estimates, and odds ratios. Specify a value greater than or equal to 50 and less than 100. CSLOGISTIC Command Additional Features The command syntax language also allows you to: Specify custom tests of effects versus a linear combination of effects or a value (using the CUSTOM subcommand).
Chapter 11 Complex Samples Ordinal Regression The Complex Samples Ordinal Regression procedure performs regression analysis on a binary or ordinal dependent variable for samples drawn by complex sampling methods. Optionally, you can request analyses for a subpopulation. Example. Representatives considering a bill before the legislature are interested in whether there is public support for the bill and how support for the bill is related to voter demographics.
65 Complex Samples Ordinal Regression Figure 11-1 Ordinal Regression dialog box E Select a dependent variable. Optionally, you can: Select variables for factors and covariates, as appropriate for your data. Specify a variable to define a subpopulation. The analysis is performed only for the selected category of the subpopulation variable, although variances are still properly estimated based on the entire dataset. Select a link function. Link function.
66 Chapter 11 Complex Samples Ordinal Regression Response Probabilities Figure 11-2 Ordinal Regression Response Probabilities dialog box The Response Probabilities dialog box allows you to specify whether the cumulative probability of a response (that is, the probability of belonging up to and including a particular category of the dependent variable) increases with increasing or decreasing values of the dependent variable.
67 Complex Samples Ordinal Regression Non-Nested Terms For the selected factors and covariates: Interaction. Creates the highest-level interaction term for all selected variables. Main effects. Creates a main-effects term for each variable selected. All 2-way. Creates all possible two-way interactions of the selected variables. All 3-way. Creates all possible three-way interactions of the selected variables. All 4-way. Creates all possible four-way interactions of the selected variables. All 5-way.
68 Chapter 11 Complex Samples Ordinal Regression Statistics Figure 11-4 Ordinal Regression Statistics dialog box Model Fit. Controls the display of statistics that measure the overall model performance. Pseudo R-square. The R2 statistic from linear regression does not have an exact counterpart among ordinal regression models. There are, instead, multiple measures that attempt to mimic the properties of the R2 statistic. Classification table.
69 Complex Samples Ordinal Regression Design effect. The ratio of the variance of the estimate to the variance obtained by assuming that the sample is a simple random sample. This is a measure of the effect of specifying a complex design, where values further from 1 indicate greater effects. Square root of design effect. This is a measure, expressed in units comparable to those of the standard error, of the effect of specifying a complex design, where values further from 1 indicate greater effects.
70 Chapter 11 Test Statistic. This group allows you to select the type of statistic used for testing hypotheses. You can choose between F, adjusted F, chi-square, and adjusted chi-square. Sampling Degrees of Freedom. This group gives you control over the sampling design degrees of freedom used to compute p values for all test statistics. If based on the sampling design, the value is the difference between the number of primary sampling units and the number of strata in the first stage of sampling.
71 Complex Samples Ordinal Regression Factors. For each selected factor, displays the ratio of the cumulative odds at each category of the factor to the odds at the specified reference category. Covariates. For each selected covariate, displays the ratio of the cumulative odds at the covariate’s mean value plus the specified units of change to the odds at the mean.
72 Chapter 11 rowtype_. Takes values (and value labels), COV (Covariances), CORR (Correlations), EST (Parameter estimates), SE (Standard errors), SIG (Significance levels), and DF (Sampling design degrees of freedom). There is a separate case with row type COV (or CORR) for each model parameter, plus a separate case for each of the other row types. varname_. Takes values P1, P2, ...
73 Complex Samples Ordinal Regression Estimation Method. You can select a parameter estimation method; choose between Newton-Raphson, Fisher scoring, or a hybrid method in which Fisher scoring iterations are performed before switching to the Newton-Raphson method. If convergence is achieved during the Fisher scoring phase of the hybrid method before the maximum number of Fisher iterations is reached, the algorithm continues with the Newton-Raphson method. Estimation.
Chapter Complex Samples Cox Regression 12 The Complex Samples Cox Regression procedure performs survival analysis for samples drawn by complex sampling methods. Optionally, you can request analyses for a subpopulation. Examples. A government law enforcement agency is concerned about recidivism rates in their area of jurisdiction. One of the measures of recidivism is the time until second arrest for offenders.
75 Complex Samples Cox Regression Subject Identifier. You can easily incorporate piecewise-constant, time-dependent predictors by splitting the observations for a single subject across multiple cases. For example, if you are analyzing survival times for patients post-stroke, variables representing their medical history should be useful as predictors. Over time, they may experience major medical events that alter their medical history.
76 Chapter 12 Figure 12-1 Cox Regression dialog box, Time and Event tab E Specify the survival time by selecting the entry and exit times from the study. E Select an event status variable. E Click Define Event and define at least one event value. Optionally, you can select a subject identifier.
77 Complex Samples Cox Regression Define Event Figure 12-2 Define Event dialog box Specify the values that indicate a terminal event has occurred. Individual value(s). Specify one or more values by entering them into the grid or selecting them from a list of values with defined value labels. Range of values. Specify a range of values by entering the minimum and maximum values or selecting values from a list with defined value labels.
78 Chapter 12 Predictors Figure 12-3 Cox Regression dialog box, Predictors tab The Predictors tab allows you to specify the factors and covariates used to build model effects. Factors. Factors are categorical predictors; they can be numeric or string. Covariates. Covariates are scale predictors; they must be numeric. Time-Dependent Predictors. There are certain situations in which the proportional hazards assumption does not hold.
79 Complex Samples Cox Regression Define Time-Dependent Predictor Figure 12-4 Cox Regression Define Time-Dependent Predictor dialog box The Define Time-Dependent Predictor dialog box allows you to create a predictor that is dependent upon the built-in time variable, T_.
80 Chapter 12 Note: If your segmented, time-dependent predictor is constant within segments, as in the blood pressure example given above, it may be easier for you to specify the piecewise-constant, time-dependent predictor by splitting subjects across multiple cases. See the discussion on Subject Identifiers in Complex Samples Cox Regression on p. 74 for more information.
81 Complex Samples Cox Regression Model Figure 12-6 Cox Regression dialog box, Model tab Specify Model Effects. By default, the procedure builds a main-effects model using the factors and covariates specified in the main dialog box. Alternatively, you can build a custom model that includes interaction effects and nested terms. Non-Nested Terms For the selected factors and covariates: Interaction. Creates the highest-level interaction term for all selected variables. Main effects.
82 Chapter 12 Nested Terms You can build nested terms for your model in this procedure. Nested terms are useful for modeling the effect of a factor or covariate whose values do not interact with the levels of another factor. For example, a grocery store chain may follow the spending habits of its customers at several store locations. Since each customer frequents only one of these locations, the Customer effect can be said to be nested within the Store location effect.
83 Complex Samples Cox Regression Sample design information. Displays summary information about the sample, including the unweighted count and the population size. Event and censoring summary. Displays summary information about the number and percentage of censored cases. Risk set at event times. Displays number of events and number at risk for each event time in each baseline stratum. Parameters. This group allows you to control the display of statistics related to the model parameters. Estimate.
84 Chapter 12 Plots Figure 12-8 Cox Regression dialog box, Plots tab The Plots tab allows you to request plots of the hazard function, survival function, log-minus-log of the survival function, and one minus the survival function. You can also choose to plot confidence intervals along the specified functions; the confidence level is set on the Options tab. Predictor patterns. You can specify a pattern of predictor values to be used for the requested plots and the exported survival file on the Export tab.
85 Complex Samples Cox Regression Hypothesis Tests Figure 12-9 Cox Regression dialog box, Hypothesis Tests tab Test Statistic. This group allows you to select the type of statistic used for testing hypotheses. You can choose between F, adjusted F, chi-square, and adjusted chi-square. Sampling Degrees of Freedom. This group gives you control over the sampling design degrees of freedom used to compute p values for all test statistics.
86 Chapter 12 Sequential Bonferroni. This is a sequentially step-down rejective Bonferroni procedure that is much less conservative in terms of rejecting individual hypotheses but maintains the same overall significance level. Sidak. This method provides tighter bounds than the Bonferroni approach. Bonferroni. This method adjusts the observed significance level for the fact that multiple contrasts are being tested. Save Figure 12-10 Cox Regression dialog box, Save tab Save Variables.
87 Complex Samples Cox Regression Upper bound of confidence interval for survival function. Saves the upper bound of the confidence interval for the survival function at the observed time and predictor values for each case. Cumulative hazard function. Saves the cumulative hazard, or −ln(survival), at the observed time and predictor values for each case. Lower bound of confidence interval for cumulative hazard function.
88 Chapter 12 Names of Saved Variables. Automatic name generation ensures that you keep all your work. Custom names allow you to discard/replace results from previous runs without first deleting the saved variables in the Data Editor. Export Figure 12-11 Cox Regression dialog box, Export tab Export model as SPSS Statistics data.
89 Complex Samples Cox Regression varname_. Takes values P1, P2, ..., corresponding to an ordered list of all model parameters, for row types COV or CORR, with value labels corresponding to the parameter strings shown in the parameter estimates table. The cells are blank for other row types. P1, P2, ...
90 Chapter 12 Options Figure 12-12 Cox Regression dialog box, Options tab Estimation. These controls specify criteria for estimation of regression coefficients. Maximum Iterations. The maximum number of iterations the algorithm will execute. Specify a non-negative integer. Maximum Step-Halving. At each iteration, the step size is reduced by a factor of 0.5 until the log-likelihood increases or maximum step-halving is reached. Specify a positive integer.
91 Complex Samples Cox Regression iteration (the initial estimates), where n is the value of the increment. If the iteration history is requested, then the last iteration is always displayed regardless of n. Tie breaking method for parameter estimation. When there are tied observed failure times, one of these methods is used to break the ties. The Efron method is more computationally expensive. Survival Functions. These controls specify criteria for computations involving the survival function.
Part II: Examples
Chapter Complex Samples Sampling Wizard 13 The Sampling Wizard guides you through the steps for creating, modifying, or executing a sampling plan file. Before using the wizard, you should have a well-defined target population, a list of sampling units, and an appropriate sample design in mind. Obtaining a Sample from a Full Sampling Frame A state agency is charged with ensuring fair property taxes from county to county.
94 Chapter 13 Figure 13-1 Sampling Wizard, Welcome step E Select Design a sample, browse to where you want to save the file, and type property_assess.csplan as the name of the plan file. E Click Next.
95 Complex Samples Sampling Wizard Figure 13-2 Sampling Wizard, Design Variables step (stage 1) E Select County as a stratification variable. E Select Township as a cluster variable. E Click Next, and then click Next in the Sampling Method step. This design structure means that independent samples are drawn for each county. In this stage, townships are drawn as the primary sampling unit using the default method, simple random sampling.
96 Chapter 13 Figure 13-3 Sampling Wizard, Sample Size step (stage 1) E Select Counts from the Units drop-down list. E Type 4 as the value for the number of units to select in this stage. E Click Next, and then click Next in the Output Variables step.
97 Complex Samples Sampling Wizard Figure 13-4 Sampling Wizard, Plan Summary step (stage 1) E Select Yes, add stage 2 now. E Click Next.
98 Chapter 13 Figure 13-5 Sampling Wizard, Design Variables step (stage 2) E Select Neighborhood as a stratification variable. E Click Next, and then click Next in the Sampling Method step. This design structure means that independent samples are drawn for each neighborhood of the townships drawn in stage 1. In this stage, properties are drawn as the primary sampling unit using simple random sampling.
99 Complex Samples Sampling Wizard Figure 13-6 Sampling Wizard, Sample Size step (stage 2) E Select Proportions from the Units drop-down list. E Type 0.2 as the value of the proportion of units to sample from each stratum. E Click Next, and then click Next in the Output Variables step.
100 Chapter 13 Figure 13-7 Sampling Wizard, Plan Summary step (stage 2) E Look over the sampling design, and then click Next.
101 Complex Samples Sampling Wizard Figure 13-8 Sampling Wizard, Draw Sample, Selection Options step E Select Custom value for the type of random seed to use, and type 241972 as the value. Using a custom value allows you to replicate the results of this example exactly. E Click Next, and then click Next in the Draw Sample Output Files step.
102 Chapter 13 Figure 13-9 Sampling Wizard, Finish step E Click Finish. These selections produce the sampling plan file property_assess.csplan and draw a sample according to that plan.
103 Complex Samples Sampling Wizard Plan Summary Figure 13-10 Plan summary The summary table reviews your sampling plan and is useful for making sure that the plan represents your intentions. Sampling Summary Figure 13-11 Stage summary This summary table reviews the first stage of sampling and is useful for checking that the sampling went according to plan. Four townships were sampled from each county, as requested.
104 Chapter 13 Figure 13-12 Stage summary This summary table (the top part of which is shown here) reviews the second stage of sampling. It is also useful for checking that the sampling went according to plan. Approximately 20% of the properties were sampled from each neighborhood from each township sampled in the first stage, as requested. Sample Results Figure 13-13 Data Editor with sample results You can see the sampling results in the Data Editor.
105 Complex Samples Sampling Wizard The agency will now use its resources to collect current valuations for the properties selected in the sample. Once those valuations are available, you can process the sample with Complex Samples analysis procedures, using the sampling plan property_assess.csplan to provide the sampling specifications. Obtaining a Sample from a Partial Sampling Frame A company is interested in compiling and selling a database of high-quality survey information.
106 Chapter 13 Figure 13-14 Sampling Wizard, Welcome step E Select Design a sample, browse to where you want to save the file, and type demo.csplan as the name of the plan file. E Click Next.
107 Complex Samples Sampling Wizard Figure 13-15 Sampling Wizard, Design Variables step (stage 1) E Select Region as a stratification variable. E Select Province as a cluster variable. E Click Next, and then click Next in the Sampling Method step. This design structure means that independent samples are drawn for each region. In this stage, provinces are drawn as the primary sampling unit using the default method, simple random sampling.
108 Chapter 13 Figure 13-16 Sampling Wizard, Sample Size step (stage 1) E Select Counts from the Units drop-down list. E Type 3 as the value for the number of units to select in this stage. E Click Next, and then click Next in the Output Variables step.
109 Complex Samples Sampling Wizard Figure 13-17 Sampling Wizard, Plan Summary step (stage 1) E Select Yes, add stage 2 now. E Click Next.
110 Chapter 13 Figure 13-18 Sampling Wizard, Design Variables step (stage 2) E Select District as a stratification variable. E Select City as a cluster variable. E Click Next, and then click Next in the Sampling Method step. This design structure means that independent samples are drawn for each district. In this stage, cities are drawn as the primary sampling unit using the default method, simple random sampling.
111 Complex Samples Sampling Wizard Figure 13-19 Sampling Wizard, Sample Size step (stage 2) E Select Proportions from the Units drop-down list. E Type 0.1 as the value of the proportion of units to sample from each strata. E Click Next, and then click Next in the Output Variables step.
112 Chapter 13 Figure 13-20 Sampling Wizard, Plan Summary step (stage 2) E Select Yes, add stage 3 now. E Click Next.
113 Complex Samples Sampling Wizard Figure 13-21 Sampling Wizard, Design Variables step (stage 3) E Select Subdivision as a stratification variable. E Click Next, and then click Next in the Sampling Method step. This design structure means that independent samples are drawn for each subdivision. In this stage, household units are drawn as the primary sampling unit using the default method, simple random sampling.
114 Chapter 13 Figure 13-22 Sampling Wizard, Sample Size step (stage 3) E Select Proportions from the Units drop-down list. E Type 0.2 as the value for the proportion of units to select in this stage. E Click Next, and then click Next in the Output Variables step.
115 Complex Samples Sampling Wizard Figure 13-23 Sampling Wizard, Plan Summary step (stage 3) E Look over the sampling design, and then click Next.
116 Chapter 13 Figure 13-24 Sampling Wizard, Draw Sample Selection Options step E Select 1, 2 as the stages to sample now. E Select Custom value for the type of random seed to use, and type 241972 as the value. Using a custom value allows you to replicate the results of this example exactly. E Click Next, and then click Next in the Draw Sample Output Files step.
117 Complex Samples Sampling Wizard Figure 13-25 Sampling Wizard, Finish step E Click Finish. These selections produce the sampling plan file demo.csplan and draw a sample according to the first two stages of that plan.
118 Chapter 13 Sample Results Figure 13-26 Data Editor with sample results You can see the sampling results in the Data Editor. Five new variables were saved to the working file, representing the inclusion probabilities and cumulative sampling weights for each stage, plus the “final” sampling weights for the first two stages. Cities with values for these variables were selected to the sample. Cities with system-missing values for the variables were not selected.
119 Complex Samples Sampling Wizard Figure 13-27 Sampling Wizard, Welcome step E Select Draw a sample, browse to where you saved the plan file, and select the demo.csplan plan file that you created. E Click Next.
120 Chapter 13 Figure 13-28 Sampling Wizard, Plan Summary step (stage 3) E Select 1, 2 as stages already sampled. E Click Next.
121 Complex Samples Sampling Wizard Figure 13-29 Sampling Wizard, Draw Sample Selection Options step E Select Custom value for the type of random seed to use and type 4231946 as the value. E Click Next, and then click Next in the Draw Sample Output Files step.
122 Chapter 13 Figure 13-30 Sampling Wizard, Finish step E Select Paste the syntax generated by the Wizard into a syntax window. E Click Finish. The following syntax is generated: * Sampling Wizard. CSSELECT /PLAN FILE='demo.csplan' /CRITERIA STAGES = 3 SEED = 4231946 /CLASSMISSING EXCLUDE /DATA RENAMEVARS /PRINT SELECTION. Printing the sampling summary in this case produces a cumbersome table that causes problems in the Output Viewer.
123 Complex Samples Sampling Wizard Sample Results Figure 13-31 Data Editor with sample results You can see the sampling results in the Data Editor. Three new variables were saved to the working file, representing the inclusion probabilities and cumulative sampling weights for the third stage, plus the final sampling weights. These new weights take into account the weights computed during the sampling of the first two stages. Units with values for these variables were selected to the sample.
124 Chapter 13 Figure 13-32 Sampling Wizard, Welcome step E Select Design a sample, browse to where you want to save the file, and type poll.csplan as the name of the plan file. E Click Next.
125 Complex Samples Sampling Wizard Figure 13-33 Sampling Wizard, Design Variables step (stage 1) E Select County as a stratification variable. E Select Township as a cluster variable. E Click Next. This design structure means that independent samples are drawn for each county. In this stage, townships are drawn as the primary sampling unit.
126 Chapter 13 Figure 13-34 Sampling Wizard, Sampling Method step (stage 1) E Select PPS as the sampling method. E Select Count data records as the measure of size. E Click Next. Within each county, townships are drawn without replacement with probability proportional to the number of records for each township. Using a PPS method generates joint sampling probabilities for the townships; you will specify where to save these values in the Output Files step.
127 Complex Samples Sampling Wizard Figure 13-35 Sampling Wizard, Sample Size step (stage 1) E Select Proportions from the Units drop-down list. E Type 0.3 as the value for the proportion of townships to select per county in this stage. Legislators from the Western county point out that there are fewer townships in their county than in others. In order to ensure adequate representation, they would like to establish a minimum of 3 townships sampled from each county.
128 Chapter 13 Figure 13-36 Sampling Wizard, Plan Summary step (stage 1) E Select Yes, add stage 2 now. E Click Next.
129 Complex Samples Sampling Wizard Figure 13-37 Sampling Wizard, Design Variables step (stage 2) E Select Neighborhood as a stratification variable. E Click Next, and then click Next in the Sampling Method step. This design structure means that independent samples are drawn for each neighborhood of the townships drawn in stage 1. In this stage, voters are drawn as the primary sampling unit using simple random sampling without replacement.
130 Chapter 13 Figure 13-38 Sampling Wizard, Sample Size step (stage 2) E Select Proportions from the Units drop-down list. E Type 0.2 as the value of the proportion of units to sample from each strata. E Click Next, and then click Next in the Output Variables step.
131 Complex Samples Sampling Wizard Figure 13-39 Sampling Wizard, Plan Summary step (stage 2) E Look over the sampling design, and then click Next.
132 Chapter 13 Figure 13-40 Sampling Wizard, Draw Sample Selection Options step E Select Custom value for the type of random seed to use, and type 592004 as the value. Using a custom value allows you to replicate the results of this example exactly. E Click Next.
133 Complex Samples Sampling Wizard Figure 13-41 Sampling Wizard, Draw Sample Selection Options step E Choose to save the sample to a new dataset, and type poll_cs_sample as the name of the dataset. E Browse to where you want to save the joint probabilities and type poll_jointprob.sav as the name of the joint probabilities file. E Click Next.
134 Chapter 13 Figure 13-42 Sampling Wizard, Finish step E Click Finish. These selections produce the sampling plan file poll.csplan and draw a sample according to that plan, save the sample results to the new dataset poll_cs_sample, and save the joint probabilities file to the external data file poll_jointprob.sav.
135 Complex Samples Sampling Wizard Plan Summary Figure 13-43 Plan summary The summary table reviews your sampling plan and is useful for making sure that the plan represents your intentions. Sampling Summary Figure 13-44 Stage summary This summary table reviews the first stage of sampling and is useful for checking that the sampling went according to plan.
136 Chapter 13 Figure 13-45 Stage summary This summary table (the top part of which is shown here) reviews the second stage of sampling. It is also useful for checking that the sampling went according to plan. Approximately 20% of the voters were sampled from each neighborhood from each township sampled in the first stage, as requested.
137 Complex Samples Sampling Wizard Sample Results Figure 13-46 Data Editor with sample results You can see the sampling results in the newly created dataset. Five new variables were saved to the working file, representing the inclusion probabilities and cumulative sampling weights for each stage, plus the final sampling weights. Voters who were not selected to the sample are excluded from this dataset.
138 Chapter 13 Figure 13-47 Data Editor with sample results Unlike voters in the second stage, the first-stage sampling weights are not identical for townships within the same county because they are selected with probability proportional to size. Figure 13-48 Joint probabilities file The file poll_jointprob.sav contains first-stage joint probabilities for selected townships within counties. County is a first-stage stratification variable, and Township is a cluster variable.
139 Complex Samples Sampling Wizard inclusion probability matrices are 4×4 for these strata, and the Joint_Prob_5_ column is left empty for these rows. Similarly, strata 3 and 5 have 3×3 joint inclusion probability matrices, and stratum 4 has a 5×5 joint inclusion probability matrix. The need for a joint probabilities file is seen by perusing the values of the joint inclusion probability matrices.
Chapter Complex Samples Analysis Preparation Wizard 14 The Analysis Preparation Wizard guides you through the steps for creating or modifying an analysis plan for use with the various Complex Samples analysis procedures. It is most useful when you do not have access to the sampling plan file used to draw the sample. Using the Complex Samples Analysis Preparation Wizard to Ready NHIS Public Data The National Health Interview Survey (NHIS) is a large, population-based survey of the U.S.
141 Complex Samples Analysis Preparation Wizard Figure 14-1 Analysis Preparation Wizard, Welcome step E Browse to where you want to save the plan file and type nhis2000_subset.csaplan as the name for the analysis plan file. E Click Next.
142 Chapter 14 Figure 14-2 Analysis Preparation Wizard, Design Variables step (stage 1) The data are obtained using a complex multistage sample. However, for end users, the original NHIS design variables were transformed to a simplified set of design and weight variables whose results approximate those of the original design structures. E Select Stratum for variance estimation as a strata variable. E Select PSU for variance estimation as a cluster variable.
143 Complex Samples Analysis Preparation Wizard Summary Figure 14-3 Summary The summary table reviews your analysis plan. The plan consists of one stage with a design of one stratification variable and one cluster variable. With-replacement (WR) estimation is used, and the plan is saved to c:\nhis2000_subset.csaplan. You can now use this plan file to process nhis2000_subset.sav with Complex Samples analysis procedures.
144 Chapter 14 Figure 14-4 Compute Variable dialog box Fifteen out of one hundred bank branches were selected without replacement in the first stage; thus, the probability that a given bank was selected is 15/100 = 0.15. E Type inclprob_s1 as the target variable. E Type 0.15 as the numeric expression. E Click OK.
145 Complex Samples Analysis Preparation Wizard Figure 14-5 Compute Variable dialog box One hundred customers were selected from each branch in the second stage; thus, the stage 2 inclusion probability for a given customer at a given bank is 100/the number of customers at that bank. E Recall the Compute Variable dialog box. E Type inclprob_s2 as the target variable. E Type 100/ncust as the numeric expression. E Click OK.
146 Chapter 14 Figure 14-6 Compute Variable dialog box Now that you have the inclusion probabilities for each stage, it’s easy to compute the final sampling weights. E Recall the Compute Variable dialog box. E Type finalweight as the target variable. E Type 1/(inclprob_s1 * inclprob_s2) as the numeric expression. E Click OK. You are now ready to create the analysis plan.
147 Complex Samples Analysis Preparation Wizard Figure 14-7 Analysis Preparation Wizard, Welcome step E Browse to where you want to save the plan file and type bankloan.csaplan as the name for the analysis plan file. E Click Next.
148 Chapter 14 Figure 14-8 Analysis Preparation Wizard, Design Variables step (stage 1) E Select Branch as a cluster variable. E Select finalweight as the sample weight variable. E Click Next.
149 Complex Samples Analysis Preparation Wizard Figure 14-9 Analysis Preparation Wizard, Estimation Method step (stage 1) E Select Equal WOR as the first-stage estimation method. E Click Next.
150 Chapter 14 Figure 14-10 Analysis Preparation Wizard, Size step (stage 1) E Select Read values from variable and select inclprob_s1 as the variable containing the first-stage inclusion probabilities. E Click Next.
151 Complex Samples Analysis Preparation Wizard Figure 14-11 Analysis Preparation Wizard, Plan Summary step (stage 1) E Select Yes, add stage 2 now. E Click Next, and then click Next in the Design Variables step.
152 Chapter 14 Figure 14-12 Analysis Preparation Wizard, Estimation Method step (stage 2) E Select Equal WOR as the second-stage estimation method. E Click Next.
153 Complex Samples Analysis Preparation Wizard Figure 14-13 Analysis Preparation Wizard, Size step (stage 2) E Select Read values from variable and select inclprob_s2 as the variable containing the second-stage inclusion probabilities. E Click Finish.
154 Chapter 14 Summary Figure 14-14 Summary table The summary table reviews your analysis plan. The plan consists of two stages with a design of one cluster variable. Equal probability without replacement (WOR) estimation is used, and the plan is saved to c:\bankloan.csaplan. You can now use this plan file to process bankloan_noweights.sav (with the inclusion probabilities and sampling weights you’ve computed) with Complex Samples analysis procedures.
Chapter Complex Samples Frequencies 15 The Complex Samples Frequencies procedure produces frequency tables for selected variables and displays univariate statistics. Optionally, you can request statistics by subgroups, defined by one or more categorical variables. Using Complex Samples Frequencies to Analyze Nutritional Supplement Usage A researcher wants to study the use of nutritional supplements among U.S.
156 Chapter 15 Figure 15-1 Complex Samples Plan dialog box E Browse to and select nhis2000_subset.csaplan. For more information, see the topic Sample Files in Appendix A in IBM SPSS Complex Samples 19. E Click Continue.
157 Complex Samples Frequencies Figure 15-2 Frequencies dialog box E Select Vitamin/mineral supplmnts-past 12 m as a frequency variable. E Select Age category as a subpopulation variable. E Click Statistics. Figure 15-3 Frequencies Statistics dialog box E Select Table percent in the Cells group. E Select Confidence interval in the Statistics group. E Click Continue. E Click OK in the Frequencies dialog box.
158 Chapter 15 Frequency Table Figure 15-4 Frequency table for variable/situation Each selected statistic is computed for each selected cell measure. The first column contains estimates of the number and percentage of the population that do or do not take vitamin/mineral supplements. The confidence intervals are non-overlapping; thus, you can conclude that, overall, more Americans take vitamin/mineral supplements than not.
159 Complex Samples Frequencies Summary Using the Complex Samples Frequencies procedure, you have obtained statistics for the use of nutritional supplements among U.S. citizens. Overall, more Americans take vitamin/mineral supplements than not. When broken down by age category, greater proportions of Americans take vitamin/mineral supplements with increasing age.
Chapter Complex Samples Descriptives 16 The Complex Samples Descriptives procedure displays univariate summary statistics for several variables. Optionally, you can request statistics by subgroups, defined by one or more categorical variables. Using Complex Samples Descriptives to Analyze Activity Levels A researcher wants to study the activity levels of U.S. citizens, using the results of the National Health Interview Survey (NHIS) and a previously created analysis plan.
161 Complex Samples Descriptives Figure 16-1 Complex Samples Plan dialog box E Browse to and select nhis2000_subset.csaplan. For more information, see the topic Sample Files in Appendix A in IBM SPSS Complex Samples 19. E Click Continue.
162 Chapter 16 Figure 16-2 Descriptives dialog box E Select Freq vigorous activity (times per wk) through Freq strength activity (times per wk) as measure variables. E Select Age category as a subpopulation variable. E Click Statistics. Figure 16-3 Descriptives Statistics dialog box E Select Confidence interval in the Statistics group.
163 Complex Samples Descriptives E Click Continue. E Click OK in the Complex Samples Descriptives dialog box. Univariate Statistics Figure 16-4 Univariate statistics Each selected statistic is computed for each measure variable. The first column contains estimates of the average number of times per week that a person engages in a particular type of activity. The confidence intervals for the means are non-overlapping.
164 Chapter 16 Each selected statistic is computed for each measure variable by values of Age category. The first column contains estimates of the average number of times per week that people of each category engage in a particular type of activity. The confidence intervals for the means allow you to make some interesting conclusions. In terms of vigorous and moderate activities, 25–44-year-olds are less active than those 18–24 and 45–64, and 45–64-year-olds are less active than those 65 or older.
Chapter Complex Samples Crosstabs 17 The Complex Samples Crosstabs procedure produces crosstabulation tables for pairs of selected variables and displays two-way statistics. Optionally, you can request statistics by subgroups, defined by one or more categorical variables. Using Complex Samples Crosstabs to Measure the Relative Risk of an Event A company that sells magazine subscriptions traditionally sends monthly mailings to a purchased database of names.
166 Chapter 17 Figure 17-1 Complex Samples Plan dialog box E Browse to and select demo.csplan. For more information, see the topic Sample Files in Appendix A in IBM SPSS Complex Samples 19. E Click Continue.
167 Complex Samples Crosstabs Figure 17-2 Crosstabs dialog box E Select Newspaper subscription as a row variable. E Select Response as a column variable. E There is also some interest in seeing the results broken down by income categories, so select Income category in thousands as a subpopulation variable. E Click Statistics.
168 Chapter 17 Figure 17-3 Crosstabs Statistics dialog box E Deselect Population size and select Row percent in the Cells group. E Select Odds ratio and Relative risk in the Summaries for 2-by-2 Tables group. E Click Continue. E Click OK in the Complex Samples Crosstabs dialog box. These selections produce a crosstabulation table and risk estimate for Newspaper subscription by Response. Separate tables with results split by Income category in thousands are also created.
169 Complex Samples Crosstabs Risk Estimate Figure 17-5 Risk estimate for newspaper subscription by response The relative risk is a ratio of event probabilities. The relative risk of a response to the mailing is the ratio of the probability that a newspaper subscriber responds to the probability that a nonsubscriber responds. Thus, the estimate of the relative risk is simply 17.2%/10.3% = 1.673.
170 Chapter 17 Risk Estimate by Subpopulation Figure 17-6 Risk estimate for newspaper subscription by response, controlling for income category Relative risk estimates are computed separately for each income category. Note that the relative risk of a positive response for newspaper subscribers appears to gradually decrease with increasing income, which indicates that you may be able to further target the mailings.
Chapter Complex Samples Ratios 18 The Complex Samples Ratios procedure displays univariate summary statistics for ratios of variables. Optionally, you can request statistics by subgroups, defined by one or more categorical variables. Using Complex Samples Ratios to Aid Property Value Assessment A state agency is charged with ensuring that property taxes are fairly assessed from county to county.
172 Chapter 18 Figure 18-1 Complex Samples Plan dialog box E Browse to and select property_assess.csplan. For more information, see the topic Sample Files in Appendix A in IBM SPSS Complex Samples 19. E Click Continue.
173 Complex Samples Ratios Figure 18-2 Ratios dialog box E Select Current value as a numerator variable. E Select Value at last appraisal as the denominator variable. E Select County as a subpopulation variable. E Click Statistics. Figure 18-3 Ratios Statistics dialog box E Select Confidence interval, Unweighted count, and Population size in the Statistics group. E Select t-test and enter 1.3 as the test value. E Click Continue. E Click OK in the Complex Samples Ratios dialog box.
174 Chapter 18 Ratios Figure 18-4 Ratios table The default display of the table is very wide, so you will need to pivot it for a better view. Pivoting the Ratios Table E Double-click the table to activate it. E From the Viewer menus choose: Pivot > Pivoting Trays E Drag Numerator and then Denominator from the row to the layer. E Drag County from the row to the column. E Drag Statistics from the column to the row. E Close the pivoting trays window.
175 Complex Samples Ratios Some of the confidence intervals do not overlap; thus, you can conclude that the ratios for the Western county are higher than the ratios for the Northern and Southern counties. Finally, as a more objective measure, note that the significance values of the t tests for the Western and Southern counties are less than 0.05. Thus, you can conclude that the ratio for the Western county is greater than 1.3 and the ratio for the Southern county is less than 1.3.
Chapter Complex Samples General Linear Model 19 The Complex Samples General Linear Model (CSGLM) procedure performs linear regression analysis, as well as analysis of variance and covariance, for samples drawn by complex sampling methods. Optionally, you can request analyses for a subpopulation. Using Complex Samples General Linear Model to Fit a Two-Factor ANOVA A grocery store chain surveyed a set of customers concerning their purchasing habits, according to a complex design.
177 Complex Samples General Linear Model Figure 19-1 Complex Samples Plan dialog box E Browse to and select grocery.csplan. For more information, see the topic Sample Files in Appendix A in IBM SPSS Complex Samples 19. E Click Continue.
178 Chapter 19 Figure 19-2 General Linear Model dialog box E Select Amount spent as the dependent variable. E Select Who shopping for and Use coupons as factors. E Click Model.
179 Complex Samples General Linear Model Figure 19-3 Model dialog box E Choose to build a Custom model. E Select Main effects as the type of term to build and select shopfor and usecoup as model terms. E Select Interaction as the type of term to build and add the shopfor*usecoup interaction as a model term. E Click Continue. E Click Statistics in the General Linear Model dialog box.
180 Chapter 19 Figure 19-4 General Linear Model Statistics dialog box E Select Estimate, Standard error, Confidence interval, and Design effect in the Model Parameters group. E Click Continue. E Click Estimated Means in the General Linear Model dialog box. Figure 19-5 General Linear Model Estimated Means dialog box E Choose to display means for shopfor, usecoup, and the shopfor*usecoup interaction.
181 Complex Samples General Linear Model E Select a Simple contrast and 3 Self and family as the reference category for shopfor. Note that, once selected, the category appears as “3” in the dialog box. E Select a Simple contrast and 1 No as the reference category for usecoup. E Click Continue. E Click OK in the General Linear Model dialog box. Model Summary Figure 19-6 R-square statistic R-square, the coefficient of determination, is a measure of the strength of the model fit.
182 Chapter 19 Parameter Estimates Figure 19-8 Parameter estimates The parameter estimates show the effect of each predictor on Amount spent. The value of 518.249 for the intercept term indicates that the grocery chain can expect a shopper with a family who uses coupons from the newspaper and targeted mailings to spend $518.25, on average. You can tell that the intercept is associated with these factor levels because those are the factor levels whose parameters are redundant.
183 Complex Samples General Linear Model The parameter estimates are useful for quantifying the effect of each model term, but the estimated marginal means tables can make it easier to interpret the model results. Estimated Marginal Means Figure 19-9 Estimated marginal means by levels of Who shopping for This table displays the model-estimated marginal means and standard errors of Amount spent at the factor levels of Who shopping for.
184 Chapter 19 Figure 19-11 Overall test results for estimated marginal means of gender The overall test table reports the results of a test of all of the contrasts in the individual test table. Its significance value of less than 0.05 confirms that there is a difference in spending among the levels of Who shopping for.
185 Complex Samples General Linear Model Figure 19-15 Estimated marginal means by levels of gender by shopping style This table displays the model-estimated marginal means, standard errors, and confidence intervals of Amount spent at the factor combinations of Who shopping for and Use coupons. This table is useful for exploring the interaction effect between these two factors that was found in the tests of model effects.
Chapter 20 Complex Samples Logistic Regression The Complex Samples Logistic Regression procedure performs logistic regression analysis on a binary or multinomial dependent variable for samples drawn by complex sampling methods. Optionally, you can request analyses for a subpopulation.
187 Complex Samples Logistic Regression Figure 20-1 Complex Samples Plan dialog box E Browse to and select bankloan.csaplan. For more information, see the topic Sample Files in Appendix A in IBM SPSS Complex Samples 19. E Click Continue.
188 Chapter 20 Figure 20-2 Logistic Regression dialog box E Select Previously defaulted as the dependent variable. E Select Level of education as a factor. E Select Age in years through Other debt in thousands as covariates. E Select Previously defaulted and click Reference Category.
189 Complex Samples Logistic Regression Figure 20-3 Logistic Regression Reference Category dialog box E Select Lowest value as the reference category. This sets the “did not default” category as the reference category; thus, the odds ratios reported in the output will have the property that increasing odds ratios correspond to increasing probability of default. E Click Continue. E Click Statistics in the Logistic Regression dialog box.
190 Chapter 20 Figure 20-5 Logistic Regression Odds Ratios dialog box E Choose to create odds ratios for the factor ed and the covariates employ and debtinc. E Click Continue. E Click OK in the Logistic Regression dialog box.
191 Complex Samples Logistic Regression What constitutes a “good” R2 value varies between different areas of application. While these statistics can be suggestive on their own, they are most useful when comparing competing models for the same data. The model with the largest R2 statistic is “best” according to this measure. Classification Figure 20-7 Classification table The classification table shows the practical results of using the logistic regression model.
192 Chapter 20 Each term in the model, plus the model as a whole, is tested for whether its effect equals 0. Terms with significance values less than 0.05 have some discernible effect. Thus, age, employ, debtinc, and creddebt contribute to the model, while the other main effects do not. In a further analysis of the data, you would probably remove ed, address, income, and othdebt from model consideration.
193 Complex Samples Logistic Regression Odds Ratios Figure 20-10 Odds ratios for level of education This table displays the odds ratios of Previously defaulted at the factor levels of Level of education. The reported values are the ratios of the odds of default for Did not complete high school through College degree, compared to the odds of default for Post-undergraduate degree. Thus, the odds ratio of 2.
194 Chapter 20 This table displays the odds ratio of Previously defaulted for a unit change in the covariate Debt to income ratio. The reported value is the ratio of the odds of default for a person with a debt/income ratio of 10.9341 compared to the odds of default for a person with 9.9341 (the mean). Note that because none of these predictors are part of interaction terms, the values of the odds ratios reported in these tables are equal to the values of the exponentiated parameter estimates.
Chapter 21 Complex Samples Ordinal Regression The Complex Samples Ordinal Regression procedure creates a predictive model for an ordinal dependent variable for samples drawn by complex sampling methods. Optionally, you can request analyses for a subpopulation.
196 Chapter 21 Figure 21-1 Complex Samples Plan dialog box E Browse to and select poll.csplan as the plan file. For more information, see the topic Sample Files in Appendix A in IBM SPSS Complex Samples 19. E Select poll_jointprob.sav as the joint probabilities file. E Click Continue.
197 Complex Samples Ordinal Regression Figure 21-2 Ordinal Regression dialog box E Select The legislature should enact a gas tax as the dependent variable. E Select Age category through Driving frequency as factors. E Click Statistics.
198 Chapter 21 Figure 21-3 Ordinal Regression Statistics dialog box E Select Classification table in the Model Fit group. E Select Estimate, Exponentiated estimate, Standard error, Confidence interval, and Design effect in the Parameters group. E Select Wald test of equal slopes and Parameter estimates for generalized (unequal slopes) model. E Click Continue. E Click Hypothesis Tests in the Complex Samples Ordinal Regression dialog box.
199 Complex Samples Ordinal Regression Figure 21-4 Hypothesis Tests dialog box Even for a moderate number of predictors and response categories, the Wald F test statistic can be inestimable for the test of parallel lines. E Select Adjusted F in the Test Statistic group. E Select Sequential Sidak as the adjustment method for multiple comparisons. E Click Continue. E Click Odds Ratios in the Complex Samples Ordinal Regression dialog box.
200 Chapter 21 E Click OK in the Complex Samples Ordinal Regression dialog box. Pseudo R-Squares Figure 21-6 Pseudo R-Squares In the linear regression model, the coefficient of determination, R2, summarizes the proportion of variance in the dependent variable associated with the predictor (independent) variables, with larger R2 values indicating that more of the variation is explained by the model, to a maximum of 1.
201 Complex Samples Ordinal Regression Each term in the model is tested for whether its effect equals 0. Terms with significance values less than 0.05 have some discernable effect. Thus, agecat and drivefreq contribute to the model, while the other main effects do not. In a further analysis of the data, you would consider removing gender and votelast from the model. Parameter Estimates The parameter estimates table summarizes the effect of each predictor.
202 Chapter 21 Those who drive less frequently show greater support for the bill than those who drive more frequently. The coefficients for the variables gender and votelast, in addition to not being statistically significant, appear to be small compared to other coefficients. The design effects indicate that some of the standard errors computed for these parameter estimates are larger than those you would obtain if you used a simple random sample, while others are smaller.
203 Complex Samples Ordinal Regression Figure 21-10 Classification table The classification table shows the practical results of using the model. For each case, the predicted response is the response category with the highest model-predicted probability. Cases are weighted by Final Sampling Weight, so that the classification table reports the expected model performance in the population. Cells on the diagonal are correct predictions. Cells off the diagonal are incorrect predictions.
204 Chapter 21 merely the ratios of the exponentiated parameter estimates. For example, the cumulative odds ratio for 18–30 vs. >60 is 1.00/0.723 = 1.383. Figure 21-12 Odds ratios for driving frequency This table displays the cumulative odds ratios for the factor levels of Driving frequency, using 10–14,999 miles/year as the reference category. Since Driving frequency is not involved in any interaction terms, the odds ratios are merely the ratios of the exponentiated parameter estimates.
205 Complex Samples Ordinal Regression Figure 21-14 Parameter estimates for generalized cumulative model (shown in part) Moreover, the estimated values of the generalized model coefficients don’t appear to differ much from the estimates under the parallel lines assumption. Dropping Non-Significant Predictors The tests of model effects showed that the model coefficients for Gender and Voted in last election are not statistically significantly different from 0.
206 Chapter 21 E Click Continue in the Plan dialog box. Figure 21-15 Ordinal Regression dialog box E Deselect Gender and Voted in last election as factors. E Click Options.
207 Complex Samples Ordinal Regression Figure 21-16 Ordinal Regression Options dialog box E Select Display iteration history. The iteration history is useful for diagnosing problems encountered by the estimation algorithm. E Click Continue. E Click OK in the Complex Samples Ordinal Regression dialog box.
208 Chapter 21 Figure 21-18 Warnings for reduced model Looking at the iteration history, the changes in the parameter estimates over the last few iterations are slight enough that you’re not terribly concerned about the warning message. Comparing Models Figure 21-19 Pseudo R-Squares for reduced model The R2 values for the reduced model are identical to those for the original model. This is evidence in favor of the reduced model.
209 Complex Samples Ordinal Regression from Disagree to Agree, more than half of whom were observed to respond Disagree or Strongly disagree. This is a very important distinction that deserves careful consideration before choosing the reduced model. Summary Using the Complex Samples Ordinal Regression Procedure, you have constructed competing models for the level of support for the proposed bill based on voter demographics.
Chapter Complex Samples Cox Regression 22 The Complex Samples Cox Regression procedure performs survival analysis for samples drawn by complex sampling methods. Using a Time-Dependent Predictor in Complex Samples Cox Regression A government law enforcement agency is concerned about recidivism rates in their area of jurisdiction. One of the measures of recidivism is the time until second arrest for offenders.
211 Complex Samples Cox Regression Figure 22-1 Compute Variable dialog box E Type date2 as the target variable. E Type DATE.DMY(30,6,2006) as the expression. E Click If.
212 Chapter 22 Figure 22-2 Compute Variable If Cases dialog box E Select Include if case satisfies condition. E Type MISSING(date2) as the expression. E Click Continue. E Click OK in the Compute Variable dialog box. E Next, to compute the time between first and second arrest, from the menus choose: Transform > Date and Time Wizard...
213 Complex Samples Cox Regression Figure 22-3 Date and Time Wizard, Welcome step E Select Calculate with dates and times. E Click Next.
214 Chapter 22 Figure 22-4 Date and Time Wizard, Do Calculations on Dates step E Select Calculate the number of time units between two dates. E Click Next.
215 Complex Samples Cox Regression Figure 22-5 Date and Time Wizard, Calculate the number of time units between two dates step E Select Date of second arrest [date2] as the first date. E Select Date of release from first arrest [date1] as the date to subtract from the first date. E Select Days as the unit. E Click Next.
216 Chapter 22 Figure 22-6 Date and Time Wizard, Calculation step E Type time_to_event as the name of the variable representing the time between the two dates. E Type Time to second arrest as the variable label. E Click Finish. Running the Analysis E To run a Complex Samples Cox Regression analysis, from the menus choose: Analyze > Complex Samples > Cox Regression...
217 Complex Samples Cox Regression Figure 22-7 Complex Samples Plan for Cox Regression dialog box E Browse to the sample files directory and select recidivism_cs.csplan as the plan file. E Select Custom file in the Joint Probabilities group, browse to the sample files directory, and select recidivism_cs_jointprob.sav. E Click Continue.
218 Chapter 22 Figure 22-8 Cox Regression dialog box, Time and Event tab E Select Time to second arrest [time_to_event] as the variable defining the end of the interval. E Select Second arrest [arrest2] as the variable defining whether the event has occurred. E Click Define Event.
219 Complex Samples Cox Regression Figure 22-9 Define Event dialog box E Select 1 Yes as the value indicating the event of interest (rearrest) has occurred. E Click Continue. E Click the Predictors tab.
220 Chapter 22 Figure 22-10 Cox Regression dialog box, Predictors tab E Select Age in years [age] as a covariate. E Click the Statistics tab.
221 Complex Samples Cox Regression Figure 22-11 Cox Regression dialog box, Statistics tab E Select Test of proportional hazards and then select Log as the time function in the Model Assumptions group. E Select Parameter estimates for alternative model. E Click OK.
222 Chapter 22 This table contains information on the sample design pertinent to the estimation of the model. There is one case per subject, and all 5,687 cases are used in the analysis. The sample represents less than 2% of the entire estimated population. The design requested 4 strata and 5 units per strata for a total of 20 units in the first stage of the design. The sampling design degrees of freedom are estimated by 20−4=16.
223 Complex Samples Cox Regression E Click New. Figure 22-16 Cox Regression Define Time-Dependent Predictor dialog box E Type t_age as the name of the time-dependent predictor you want to define. E Type ln(T_)*age as the numeric expression. E Click Continue.
224 Chapter 22 Figure 22-17 Cox Regression dialog box, Predictors tab E Select t_age as a covariate. E Click the Statistics tab.
225 Complex Samples Cox Regression Figure 22-18 Cox Regression dialog box, Predictors tab E Select Estimate, Standard error, Confidence interval, and Design effect in the Parameters group. E Deselect Test of proportional hazards and Parameter estimates for alternative model in the Model Assumptions group. E Click OK. Tests of Model Effects Figure 22-19 Tests of model effects With the addition of the time-dependent predictor, the significance value for age is 0.
226 Chapter 22 Parameter Estimates Figure 22-20 Parameter estimates Looking at the parameter estimates and standard errors, you can see that you have replicated the alternative model from the test of proportional hazards. By explicitly specifying the model, you can request additional parameter statistics and plots.
227 Complex Samples Cox Regression Preparing the Data for Analysis Before restructuring the data, you will need to create two ancillary variables to help with the restructuring. E To compute a new variable, from the menus choose: Transform > Compute Variable... Figure 22-21 Compute Variable dialog box E Type start_time2 as the target variable. E Type time1 as the numeric expression. E Click OK.
228 Chapter 22 E Recall the Compute Variable dialog box. Figure 22-22 Compute Variable dialog box E Type start_time3 as the target variable. E Type time2 as the numeric expression. E Click OK. E To restructure the data from variables to cases, from the menus choose: Data > Restructure...
229 Complex Samples Cox Regression Figure 22-23 Restructure Data Wizard, Welcome step E Make sure Restructure selected variables into cases is selected. E Click Next.
230 Chapter 22 Figure 22-24 Restructure Data Wizard, Variables to Cases Number of Variable Groups step E Select More than one variable group to restructure. E Type 6 as the number of groups. E Click Next.
231 Complex Samples Cox Regression Figure 22-25 Restructure Data Wizard, Variables to Cases Select Variables step E In the Case Group Identification group, select Use selected variable and select Patient ID [patid] as the subject identifier. E Type event as the first target variable. E Select First event post-attack [event1], Second event post-attack [event2], and Third event post-attack [event3] as variables to be transposed. E Select trans2 from the target variable list.
232 Chapter 22 Figure 22-26 Restructure Data Wizard, Variables to Cases Select Variables step E Type start_time as the target variable. E Select Length of stay for rehabilitation [los_rehab], start_time2, and start_time3 as variables to be transposed. Time to first event post-attack [time1] and Time to second event post-attack [time2] will be used to create the end times, and each variable can only appear in one list of variables to be transposed, thus start_time2 and start_time3 were necessary.
233 Complex Samples Cox Regression Figure 22-27 Restructure Data Wizard, Variables to Cases Select Variables step E Type time_to_event as the target variable. E Select Time to first event post-attack [time1], Time to second event post-attack [time2], and Time to third event post-attack [time3] as variables to be transposed. E Select trans4 from the target variable list.
234 Chapter 22 Figure 22-28 Restructure Data Wizard, Variables to Cases Select Variables step E Type mi as the target variable. E Select History of myocardial infarction [mi], History of myocardial infarction [mi1], and History of myocardial infarction [mi2] as variables to be transposed. E Select trans5 from the target variable list.
235 Complex Samples Cox Regression Figure 22-29 Restructure Data Wizard, Variables to Cases Select Variables step E Type is as the target variable. E Select History of ischemic stroke [is], History of ischemic stroke [is1], and History of ischemic stroke [is2] as variables to be transposed. E Select trans6 from the target variable list.
236 Chapter 22 Figure 22-30 Restructure Data Wizard, Variables to Cases Select Variables step E Type hs as the target variable. E Select History of hemorrhagic stroke [hs], History of hemorrhagic stroke [hs1], and History of hemorrhagic stroke [hs2] as variables to be transposed. E Click Next, then click Next in the Create Index Variables step.
237 Complex Samples Cox Regression Figure 22-31 Restructure Data Wizard, Variables to Cases Create One Index Variable step E Type event_index as the name of the index variable and type Event index as the variable label. E Click Next.
238 Chapter 22 Figure 22-32 Restructure Data Wizard, Variables to Cases Create One Index Variable step E Make sure Keep and treat as fixed variable(s) is selected. E Click Finish.
239 Complex Samples Cox Regression Figure 22-33 Restructured data The restructured data contains three cases for every patient; however, many patients experienced fewer than three events, so there are many cases with negative (missing) values for event. You can simply filter these from the dataset. E To filter these cases, from the menus choose: Data > Select Cases...
240 Chapter 22 Figure 22-34 Select Cases dialog box E Select If condition is satisfied. E Click If.
241 Complex Samples Cox Regression Figure 22-35 Select Cases If dialog box E Type event >= 0 as the conditional expression. E Click Continue.
242 Chapter 22 Figure 22-36 Select Cases dialog box E Select Delete unselected cases. E Click OK. Creating a Simple Random Sampling Analysis Plan Now you are ready to create the simple random sampling analysis plan. E First, you need to create a sampling weight variable. From the menus choose: Transform > Compute Variable...
243 Complex Samples Cox Regression Figure 22-37 Cox Regression main dialog box E Type sampleweight as the target variable. E Type 1 as the numeric expression. E Click OK. You are now ready to create the analysis plan. Note: There is an existing plan file, srs.csaplan, in the sample files directory that you can use if you want to skip the following instructions and proceed to analysis of the data. E To create the analysis plan, from the menus choose: Analyze > Complex Samples > Prepare for Analysis...
244 Chapter 22 Figure 22-38 Analysis Preparation Wizard, Welcome step E Select Create a plan file and type srs.csaplan as the name of the file. Alternatively, browse to the location you want to save it. E Click Next.
245 Complex Samples Cox Regression Figure 22-39 Analysis Preparation Wizard, Design Variables E Select sampleweight as the sample weight variable. E Click Next.
246 Chapter 22 Figure 22-40 Analysis Preparation Wizard, Estimation Method E Deselect Use finite population correction. E Click Finish. You are now ready to run the analysis. Running the Analysis E To run a Complex Samples Cox Regression analysis, from the menus choose: Analyze > Complex Samples > Cox Regression...
247 Complex Samples Cox Regression Figure 22-41 Plan for Cox Regression dialog box E Browse to where you saved the simple random sampling analysis plan, or to the sample files directory, and select srs.csaplan. E Click Continue.
248 Chapter 22 Figure 22-42 Cox Regression dialog box, Time and Event tab E Select Varies by subject and select Length of stay for rehabilitation [los_rehab] as the start variable. Note that the restructured variable has taken the variable label from the first variable used to construct it, though the label is not necessarily appropriate for the constructed variable. E Select Time to first event post-attack [time_to_event] as the end variable.
249 Complex Samples Cox Regression Figure 22-43 Define Event dialog box E Select 4 Death as the value indicating the terminal event has occurred. E Click Continue.
250 Chapter 22 Figure 22-44 Cox Regression dialog box, Time and Event tab E Select Patient ID [patid] as the subject identifier. E Click the Predictors tab.
251 Complex Samples Cox Regression Figure 22-45 Cox Regression dialog box, Predictors tab E Select History of myocardial infarction [mi] through History of hemorrhagic stroke [hs] as factors. E Click the Statistics tab.
252 Chapter 22 Figure 22-46 Cox Regression dialog box, Statistics tab E Select Estimate, Exponentiated estimate, Standard error, and Confidence interval in the Parameters group. E Click the Plots tab.
253 Complex Samples Cox Regression Figure 22-47 Cox Regression dialog box, Statistics tab E Select Log-minus-log of survival function. E Check Separate Lines for History of myocardial infarction. E Select 1.0 as the level for History of ischemic stroke. E Select 0.0 as the level for History of hemorrhagic stroke. E Click the Options tab.
254 Chapter 22 Figure 22-48 Cox Regression dialog box, Options tab E Select Breslow as the tie-breaking method in the Estimation group. E Click OK. Sample Design Information Figure 22-49 Sample design information This table contains information on the sample design pertinent to the estimation of the model.
255 Complex Samples Cox Regression There are multiple cases for some subjects, and all 3,310 cases are used in the analysis. The design has a single stratum and 2,421 units (one for each subject). The sampling design degrees of freedom are estimated by 2421−1=2420. Tests of Model Effects Figure 22-50 Tests of model effects The significance value for each effect is near 0, suggesting that they all contribute to the model.
256 Chapter 22 The confidence intervals for [mi=0] and [mi=1] do not overlap with the interval for [mi=2], and none of them include 0. Therefore, it appears that the hazard for patients with one or no prior mi’s is distinguishable from the hazard for patients with two prior mi’s, which in turn is distinguishable from the hazard for patients with three prior mi’s. Similar relationships hold for the levels of is and hs, where increasing the number of prior incidents increases the hazard of death.
257 Complex Samples Cox Regression Log-Minus-Log Plot Figure 22-53 Log-minus-log plot This plot displays the log-minus-log of the survival function, ln(−ln(suvival)), versus the survival time. This particular plot displays a separate curve for each category of History of myocardial infarction, with History of ischemic stroke fixed at One and History of hemorrhagic stroke fixed at None, and is a useful visualization of the effect of History of myocardial infarction on the survival function.
Appendix A Sample Files The sample files installed with the product can be found in the Samples subdirectory of the installation directory. There is a separate folder within the Samples subdirectory for each of the following languages: English, French, German, Italian, Japanese, Korean, Polish, Russian, Simplified Chinese, Spanish, and Traditional Chinese. Not all sample files are available in all languages.
259 Sample Files loans. The last 150 cases are prospective customers that the bank needs to classify as good or bad credit risks. bankloan_binning.sav. This is a hypothetical data file containing financial and demographic information on 5,000 past customers. behavior.sav. In a classic example (Price and Bouffard, 1974), 52 students were asked to rate the combinations of 15 situations and 15 behaviors on a 10-point scale ranging from 0=“extremely appropriate” to 9=“extremely inappropriate.
260 Appendix A carpet_prefs.sav. This data file is based on the same example as described for carpet.sav, but it contains the actual rankings collected from each of the 10 consumers. The consumers were asked to rank the 22 product profiles from the most to the least preferred. The variables PREF1 through PREF22 contain the identifiers of the associated profiles, as defined in carpet_plan.sav. catalog.sav.
261 Sample Files debate.sav. This is a hypothetical data file that concerns paired responses to a survey from attendees of a political debate before and after the debate. Each case corresponds to a separate respondent. debate_aggregate.sav. This is a hypothetical data file that aggregates the responses in debate.sav. Each case corresponds to a cross-classification of preference before and after the debate. demo.sav.
262 Appendix A guttman.sav. Bell (Bell, 1961) presented a table to illustrate possible social groups.
263 Sample Files nhis2000_subset.sav. The National Health Interview Survey (NHIS) is a large, population-based survey of the U.S. civilian population. Interviews are carried out face-to-face in a nationally representative sample of households. Demographic information and observations about health behaviors and status are obtained for members of each household. This data file contains a subset of information from the 2000 survey. National Center for Health Statistics.
264 Appendix A and sample weights. The additional variable Current value was collected and added to the data file after the sample was taken. recidivism.sav. This is a hypothetical data file that concerns a government law enforcement agency’s efforts to understand recidivism rates in their area of jurisdiction.
265 Sample Files stroke_clean.sav. This hypothetical data file contains the state of a medical database after it has been cleaned using procedures in the Data Preparation option. stroke_invalid.sav. This hypothetical data file contains the initial state of a medical database and contains several data entry errors. stroke_survival. This hypothetical data file concerns survival times for patients exiting a rehabilitation program post-ischemic stroke face a number of challenges.
266 Appendix A tv-survey.sav. This is a hypothetical data file that concerns a survey conducted by a TV studio that is considering whether to extend the run of a successful program. 906 respondents were asked whether they would watch the program under various conditions. Each row represents a separate respondent; each column is a separate condition. ulcer_recurrence.sav.
Appendix B Notices Licensed Materials – Property of SPSS Inc., an IBM Company. © Copyright SPSS Inc. 1989, 2010. Patent No. 7,023,453 The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: SPSS INC.
268 Appendix B using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. SPSS Inc., therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided “AS IS”, without warranty of any kind. SPSS Inc.
Bibliography Bell, E. H. 1961. Social foundations of human behavior: Introduction to the study of sociology. New York: Harper & Row. Blake, C. L., and C. J. Merz. 1998. "UCI Repository of machine learning databases." Available at http://www.ics.uci.edu/~mlearn/MLRepository.html. Breiman, L., and J. H. Friedman. 1985. Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 80, 580–598. Cochran, W. G. 1977. Sampling Techniques, 3rd ed.
270 Bibliography Rosenberg, S., and M. P. Kim. 1975. The method of sorting as a data-gathering procedure in multivariate research. Multivariate Behavioral Research, 10, 489–502. Särndal, C., B. Swensson, and J. Wretman. 1992. Model Assisted Survey Sampling. New York: Springer-Verlag. Van der Ham, T., J. J. Meulman, D. C. Van Strien, and H. Van Engeland. 1997. Empirically based subgrouping of eating disorders in adolescents: A longitudinal perspective. British Journal of Psychiatry, 170, 363–368.
Index adjusted chi-square in Complex Samples, 49, 59, 69 in Complex Samples Cox Regression, 85 adjusted F statistic in Complex Samples, 49, 59, 69 in Complex Samples Cox Regression, 85 adjusted residuals in Complex Samples Crosstabs, 39 aggregated residuals in Complex Samples Cox Regression, 86 analysis plan, 19 baseline strata in Complex Samples Cox Regression, 80 Bonferroni in Complex Samples, 49, 59, 69 in Complex Samples Cox Regression, 85 Breslow estimation method in Complex Samples Cox Regression, 90
272 Index pseudo R2 statistics, 190 reference category, 55 related procedures, 194 save variables, 61 statistics, 57 tests of model effects, 191 Complex Samples Ordinal Regression, 64, 195 classification tables, 202 generalized cumulative model, 204 model, 66 odds ratios, 70, 203 options, 72 parameter estimates, 201 pseudo R2 statistics, 200, 208 related procedures, 209 response probabilities, 66 save variables, 71 statistics, 68 tests of model effects, 200 warnings, 207 Complex Samples Ratios, 42, 171 mis
273 Index iterations in Complex Samples Logistic Regression, 62 in Complex Samples Ordinal Regression, 72 least significant difference in Complex Samples, 49, 59, 69 in Complex Samples Cox Regression, 85 legal notices, 267 likelihood convergence in Complex Samples Logistic Regression, 62 in Complex Samples Ordinal Regression, 72 log-minus-log plot in Complex Samples Cox Regression, 257 marginal means in GLM Univariate, 183 martingale residuals in Complex Samples Cox Regression, 86 mean in Complex Samples D
274 Index sample weights in Analysis Preparation Wizard, 20 in Sampling Wizard, 12 sampling complex design, 4 sampling estimation in Analysis Preparation Wizard, 22 sampling frame, full in Sampling Wizard, 93 sampling frame, partial in Sampling Wizard, 105 sampling method in Sampling Wizard, 8 Schoenfeld’s partial residuals in Complex Samples Cox Regression, 86 score residuals in Complex Samples Cox Regression, 86 separation in Complex Samples Logistic Regression, 62 in Complex Samples Ordinal Regression,