User Guide

Manuals Brands SPSS Manuals Other SPSS BASE USERS GUIDE 13.0

SPSS

13.0 Base User's Guide

Summary of content (744 pages)

PAGE 1

SPSS 13.
PAGE 2

For more information about SPSS® software products, please visit our Web site at http://www.spss.com or contact SPSS Inc. 233 South Wacker Drive, 11th Floor Chicago, IL 60606-6412 Tel: (312) 651-3000 Fax: (312) 651-3668 SPSS is a registered trademark and the other product names are the trademarks of SPSS Inc. for its proprietary computer software.
PAGE 3

Preface SPSS 13.0 SPSS 13.0 is a comprehensive system for analyzing data. SPSS can take data from almost any type of file and use them to generate tabulated reports, charts and plots of distributions and trends, descriptive statistics, and complex statistical analyses. This manual, the SPSS® Base 13.0 User’s Guide, documents the graphical user interface of SPSS for Windows. Examples using the statistical procedures found in SPSS Base 13.0 are provided in the Help system, installed with the software.
PAGE 4

SPSS Tables™ creates a variety of presentation-quality tabular reports, including complex stub-and-banner tables and displays of multiple response data. SPSS Trends™ performs comprehensive forecasting and time series analyses with multiple curve-fitting models, smoothing models, and methods for estimating autoregressive functions. SPSS Categories® performs optimal scaling procedures, including correspondence analysis. SPSS Conjoint™ performs conjoint analysis.
PAGE 5

Compatibility SPSS is designed to run on many computer systems. See the installation instructions that came with your system for specific information on minimum and recommended requirements. Serial Numbers Your serial number is your identification number with SPSS Inc. You will need this serial number when you contact SPSS for information regarding support, payment, or an upgraded system. The serial number was provided with your Base system.
PAGE 6

Additional Publications Additional copies of SPSS product manuals may be purchased directly from SPSS Inc. Visit the SPSS Web Store at http://www.spss.com/estore, or contact your local SPSS office, listed on the SPSS Web site at http://www.spss.com/worldwide. For telephone orders in the United States and Canada, call SPSS Inc. at 800-543-2185. For telephone orders outside of North America, contact your local office, listed on the SPSS Web site.
PAGE 7

Contents 1 Overview 1 What’s New in SPSS 13.0? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Status Bar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Dialog Boxes . . . . . . . . . . . . . . . .
PAGE 8

3 Data Files 19 Opening a Data File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 To Open Data Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Data File Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Opening File Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Reading Excel Files. . . . . . . . . . . . . . . . . . . .
PAGE 9

4 Distributed Analysis Mode 63 Distributed versus Local Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5 75 Data Editor Data View. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Variable View . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Entering Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Editing Data . . . . .
PAGE 10

7 129 Data Transformations Computing Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Missing Values in Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Random Number Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Count Occurrences of Values within Cases . . . . . . . . . . . . .
PAGE 11

9 Working with Output 221 Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Using Output in Other Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Pasting Objects into the Viewer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Paste Special . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Pasting Objects from Other Applications into the Viewer. . .
PAGE 12

To Change Pivot Table Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 Table Properties: General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 To Change General Table Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Table Properties: Footnotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280 To Change Footnote Marker Properties . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 13

To Print Hidden Layers of a Pivot Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Controlling Table Breaks for Wide and Long Tables . . . . . . . . . . . . . . . . . . 295 12 Working with Command Syntax 297 Syntax Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 Pasting Syntax from Dialog Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Copying Syntax from the Output Log . . . . . . . . . . . . . . . . . . . . . .
PAGE 14

Explore Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 Explore Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 16 Crosstabs 327 Crosstabs Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 Crosstabs Clustered Bar Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 Crosstabs Statistics . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 15

20 T Tests 357 Independent-Samples T Test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Paired-Samples T Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 One-Sample T Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 21 One-Way ANOVA 367 One-Way ANOVA Contrasts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 One-Way ANOVA Post Hoc Tests . . . . . . . . . . . . . . . . .
PAGE 16

401 24 Partial Correlations Partial Correlations Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404 25 Distances 405 Distances Dissimilarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 Distances Similarity Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 408 26 Linear Regression 409 Linear Regression Variable Selection Methods . . . . . . . . . . . . . . . . . . . . . 414 Linear Regression Set Rule . . . . . . .
PAGE 17

Discriminant Analysis Select Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Discriminant Analysis Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Discriminant Analysis Stepwise Method . . . . . . . . . . . . . . . . . . . . . . . . . . 437 Discriminant Analysis Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 Discriminant Analysis Save. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 18

Hierarchical Cluster Analysis Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . 470 Hierarchical Cluster Analysis Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471 Hierarchical Cluster Analysis Save New Variables . . . . . . . . . . . . . . . . . . 471 33 K-Means Cluster Analysis 473 K-Means Cluster Analysis Efficiency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 K-Means Cluster Analysis Iterate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 19

Multiple Response Crosstabs Define Ranges . . . . . . . . . . . . . . . . . . . . . . 518 Multiple Response Crosstabs Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518 MULT RESPONSE Command Additional Features . . . . . . . . . . . . . . . . . . . 519 521 36 Reporting Results Report Summaries in Rows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 Report Summaries in Columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 20

40 Overview of the Chart Facility 555 Creating and Modifying a Chart. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555 Chart Definition Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561 569 41 ROC Curves ROC Curve Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572 42 Utilities 573 Variable Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 21

Data Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594 Currency Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595 Script Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597 44 Customizing Menus and Toolbars 599 Menu Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599 Customizing Toolbars . . . . . . .
PAGE 22

Autoscripts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625 Creating and Editing Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626 To Edit a Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 Script Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628 Starter Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 23

Appendices A Database Access Administrator 689 B Customizing HTML Documents 691 To Add Customized HTML Code to Exported Output Documents . . . . . . . . 691 Content and Format of the Text File for Customized HTML . . . . . . . . . . . . . 692 To Use a Different File or Location for Custom HTML Code . . . . . . . . . . . .
PAGE 24
PAGE 25

Chapter 1 Overview SPSS for Windows provides a powerful statistical analysis and data management system in a graphical environment, using descriptive menus and simple dialog boxes to do most of the work for you. Most tasks can be accomplished simply by pointing and clicking the mouse. In addition to the simple point-and-click interface for statistical analysis, SPSS for Windows provides: Data Editor. A versatile spreadsheet-like system for defining, entering, editing, and displaying data. Viewer.
PAGE 26

2 Chapter 1 Online Help. Detailed tutorials provide a comprehensive overview; context-sensitive Help topics in dialog boxes guide you through specific tasks; pop-up definitions in pivot table results explain statistical terms; the Statistics Coach helps you find the procedures that you need; and Case Studies provide hands-on examples of how to use statistical procedures and interpret the results. Command language.
PAGE 27

3 Overview Charts 3-D bar charts. Population pyramids. Dot plots. Paneled charts. Statistical Enhancements New Classification Tree option for building tree models. New GLM procedure in the Complex Samples option. New Logistic Regression procedure in the Complex Samples option. New Multiple Correspondence procedure in the Categories option.
PAGE 28

4 Chapter 1 You can control the treatment of error conditions in included command syntax files with the new INSERT command. You can use the FILE HANDLE command to define directory paths. More information about command syntax is available in the SPSS Command Syntax Reference PDF file, which you can access from the Help menu. SPSS Server You can score data based on models built with many SPSS procedures. For more information, see “Scoring Data with Predictive Models” in Chapter 7 on p. 174.
PAGE 29

5 Overview Text Output Editor. Text output not displayed in pivot tables can be modified with the Text Output Editor. You can edit the output and change font characteristics (type, style, color, size). Syntax Editor. You can paste your dialog box choices into a syntax window, where your selections appear in the form of command syntax. You can then edit the command syntax to use special features of SPSS not available through dialog boxes.
PAGE 30

6 Chapter 1 Designated versus Active Window If you have more than one open Viewer window, output is routed to the designated Viewer window. If you have more than one open Syntax Editor window, command syntax is pasted into the designated Syntax Editor window. The designated windows are indicated by an exclamation point (!) in the status bar. You can change the designated windows at any time. The designated window should not be confused with the active window, which is the currently selected window.
PAGE 31

7 Overview Status Bar The status bar at the bottom of each SPSS window provides the following information: Command status. For each procedure or command that you run, a case counter indicates the number of cases processed so far. For statistical procedures that require iterative processing, the number of iterations is displayed. Filter status.
PAGE 32

8 Chapter 1 Variable Names and Variable Labels in Dialog Box Lists You can display either variable names or variable labels in dialog box lists. To control the display of variable names or labels, choose Options from the Edit menu in any window. To define or modify variable labels, use Variable View in the Data Editor. For data imported from database sources, field names are used as variable labels.
PAGE 33

9 Overview Reset. Deselects any variables in the selected variable list(s) and resets all specifications in the dialog box and any subdialog boxes to the default state. Cancel. Cancels any changes in the dialog box settings since the last time it was opened and closes the dialog box. Within a session, dialog box settings are persistent. A dialog box retains your last set of specifications until you override them. Help. Context-sensitive Help.
PAGE 34

10 Chapter 1 Getting Information about Variables in Dialog Boxes E Right-click on a variable in the source or target variable list. E Select Variable Information from the pop-up context menu. Figure 1-4 Variable information with right mouse button Getting Information about Dialog Box Controls E Right-click the control you want to know about. E Select What’s This? on the pop-up context menu. A pop-up window displays information about the control.
PAGE 35

11 Overview Figure 1-5 Right mouse button “What’s This?” pop-up Help for dialog box controls Basic Steps in Data Analysis Analyzing data with SPSS is easy. All you have to do is: Get your data into SPSS. You can open a previously saved SPSS data file; read a spreadsheet, database, or text data file; or enter your data directly in the Data Editor. Select a procedure. Select a procedure from the menus to calculate statistics or to create a chart. Select the variables for the analysis.
PAGE 36

12 Chapter 1 Statistics Coach If you are unfamiliar with SPSS or with the statistical procedures available in SPSS, the Statistics Coach can help you get started by prompting you with simple questions, nontechnical language, and visual examples that help you select the basic statistical and charting features that are best suited for your data.
PAGE 37

Chapter 2 Getting Help Help is provided in many different forms: Help menu. The Help menu in most SPSS windows provides access to the main Help system, plus tutorials and technical reference material. Topics. Provides access to the Contents, Index, and Search tabs, which you can use to find specific Help topics. Tutorial. Illustrated, step-by-step instructions on how to use many of the basic features in SPSS. You don’t have to view the whole tutorial from start to finish.
PAGE 38

14 Chapter 2 Dialog box Help buttons. Most dialog boxes have a Help button that takes you directly to a Help topic for that dialog box. The Help topic provides general information and links to related topics. Dialog box context menu Help. Many dialog boxes provide context-sensitive Help for individual controls and features. Right-click on any control in a dialog box and select What’s This? from the context menu to display a description of the control and directions for its use.
PAGE 39

15 Getting Help E Click the Contents tab. E Double-click items with a book icon to expand or collapse the contents. E Click an item to go to that Help topic. Figure 2-1 Help window with Contents tab displayed Using the Help Index E In any window, from the menus choose: Help Topics E Click the Index tab. E Enter a term to search for in the index. E Double-click the topic that you want. The Help index uses incremental search to find the text that you enter and selects the closest match in the index.
PAGE 40

16 Chapter 2 Figure 2-2 Index tab and incremental search Getting Help on Dialog Box Controls E Right-click on the dialog box control that you want information about. E Choose What’s This? from the pop-up context menu. A description of the control and how to use it is displayed in a pop-up window. General information about a dialog box is available from the Help button in the dialog box.
PAGE 41

17 Getting Help Figure 2-3 Dialog box control Help with right mouse button Getting Help on Output Terms E Double-click the pivot table to activate it. E Right-click on the term that you want to be explained. E Choose What’s This? from the context menu. A definition of the term is displayed in a pop-up window.
PAGE 42

18 Chapter 2 Figure 2-4 Activated pivot table glossary Help with right mouse button Using Case Studies E Right-click on a pivot table in the Viewer window. E Choose Case Studies from the pop-up context menu. Copying Help Text from a Pop-Up Window E Right-click anywhere in the pop-up window. E Choose Copy from the context menu. The entire text of the pop-up window is copied.
PAGE 43

Chapter 3 Data Files Data files come in a wide variety of formats, and this software is designed to handle many of them, including: Spreadsheets created with Lotus 1-2-3 and Excel Database files created with dBASE and various SQL formats Tab-delimited and other types of ASCII text files Data files in SPSS format created on other operating systems SYSTAT data files SAS data files Opening a Data File In addition to files saved in SPSS format, you can open Excel, Lotus 1-2-3, dBASE,
PAGE 44

20 Chapter 3 Optionally, you can: Read variable names from the first row for spreadsheet and tab-delimited files. Specify a range of cells to read for spreadsheet files. Specify a sheet within an Excel file to read (Excel 5 or later). Data File Types SPSS. Opens data files saved in SPSS format, including SPSS for Windows, Macintosh, UNIX, and also the DOS product SPSS/PC+. SPSS/PC+. Opens SPSS/PC+ data files. SYSTAT. Opens SYSTAT data files. SPSS Portable.
PAGE 45

21 Data Files Opening File Options Read variable names. For spreadsheets, you can read variable names from the first row of the file or the first row of the defined range. The values are converted as necessary to create valid variable names, including converting spaces to underscores. Worksheet. Excel 5 or later files can contain multiple worksheets. By default, the Data Editor reads the first worksheet. To read a different worksheet, select the worksheet from the drop-down list. Range.
PAGE 46

22 Chapter 3 Variable names. If you read the first row of the Excel file (or the first row of the specified range) as variable names, values that don’t conform to variable naming rules are converted to valid variable names, and the original names are used as variable labels. If you do not read variable names from the Excel file, default variable names are assigned.
PAGE 47

23 Data Files Reading Database Files You can read data from any database format for which you have a database driver. In local analysis mode, the necessary drivers must be installed on your local computer. In distributed analysis mode (available with the server version), the drivers must be installed on the remote server. For more information, see “Distributed Analysis Mode” in Chapter 4 on p. 63. To Read Database Files E From the menus choose: File Open Database New Query... E Select the data source.
PAGE 48

24 Chapter 3 E Follow the instructions for creating a new query. To Read Database Files with Saved Queries E From the menus choose: File Open Database Run Query... E Select the query file (*.spq) that you want to run. E Depending on the database file, you may need to enter a login name and password. E If the query has an embedded prompt, you may need to enter other information (for example, the quarter for which you want to retrieve sales figures).
PAGE 49

25 Data Files Figure 3-1 Database Wizard dialog box Database Login If your database requires a password, the Database Wizard will prompt you for one before it can open the data source.
PAGE 50

26 Chapter 3 Figure 3-2 Login dialog box Selecting Data Fields The Select Data step controls which tables and fields are read. Database fields (columns) are read as variables. If a table has any field(s) selected, all of its fields will be visible in the following Database Wizard windows, but only those fields selected in this dialog box will be imported as variables. This enables you to create table joins and to specify criteria using fields that you are not importing.
PAGE 51

27 Data Files Figure 3-3 Select Data dialog box Displaying field names. To list the fields in a table, click the plus sign (+) to the left of a table name. To hide the fields, click the minus sign (–) to the left of a table name. To add a field. Double-click any field in the Available Tables list, or drag it to the Retrieve Fields in This Order list. Fields can be reordered by dragging and dropping them within the selected fields list. To remove a field.
PAGE 52

28 Chapter 3 Creating a Relationship between Tables The Specify Relationships dialog box allows you to define the relationships between the tables. If fields from more than one table are selected, you must define at least one join. Figure 3-4 Specify Relationships dialog box Establishing relationships. To create a relationship, drag a field from any table onto the field to which you want to join it. The Database Wizard will draw a join line between the two fields, indicating their relationship.
PAGE 53

29 Data Files Specifying join types. If outer joins are supported by your driver, you can specify either inner joins, left outer joins, or right outer joins. To select the type of join, double-click the join line between the fields, and the wizard will display the Relationship Properties dialog box. You can also use the icons in the upper right corner of the dialog box to choose the type of join. Relationship Properties This dialog box allows you to specify which type of relationship joins your tables.
PAGE 54

30 Chapter 3 Figure 3-6 Creating an inner join Outer joins. A left outer join includes all records from the table on the left and only those records from the table on the right in which the related fields are equal. In a right outer join, this relationship is switched, so that the join imports all records from the table on the right and only those records from the table on the left in which the related fields are equal.
PAGE 55

31 Data Files Figure 3-7 Creating a right outer join Limiting Retrieved Cases The Limit Retrieved Cases dialog box allows you to specify the criteria to select subsets of cases (rows). Limiting cases generally consists of filling the criteria grid with one or more criteria. Criteria consist of two expressions and some relation between them. They return a value of true, false, or missing for each case. If the result is true, the case is selected.
PAGE 56

32 Chapter 3 Most criteria use one or more of the six relational operators (<, >, <=, >=, =, and <>). Expressions can include field names, constants, arithmetic operators, numeric and other functions, and logical variables. You can use fields that you do not plan to import as variables. Figure 3-8 Limit Retrieved Cases dialog box To build your criteria, you need at least two expressions and a relation to connect them. E To build an expression, put your cursor in an Expression cell.
PAGE 57

33 Data Files E The two expressions are usually connected by a relational operator, such as = or >. To choose the relation, put your cursor in the Relation cell and either type the operator or select it from the drop-down menu. Dates and times in expressions need to be specified in a special manner (including the curly braces shown in the examples): Date literals should be specified using the general form: {d yyyy-mm-dd}. Time literals should be specified using the general form: {t hh:mm:ss}.
PAGE 58

34 Chapter 3 you may want to run the same query to see sales figures for different fiscal quarters. Place your cursor in any Expression cell, and click Prompt for Value to create a prompt. Note: If you use random sampling, aggregation (available in distributed mode with SPSS Server) is not available. Creating a Parameter Query Use the Prompt for Value dialog box to create a dialog box that solicits information from users each time someone runs your query.
PAGE 59

35 Data Files The final result looks like this: Figure 3-10 User-defined prompt dialog box Aggregating Data If you are in distributed mode, connected to a remote server (available with SPSS Server), you can aggregate the data before reading it into SPSS.
PAGE 60

36 Chapter 3 You can also aggregate data after reading it into SPSS, but preaggregating may save time for large data sources. E Select one or more break variables that define how cases are grouped to create aggregated data. E Select one or more aggregate variables. E Select an aggregate function for each aggregate variable. Optionally, you can create a variable that contains the number of cases in each break group. Note: If you use random sampling, aggregation is not available.
PAGE 61

37 Data Files Figure 3-12 Define Variables dialog box Sorting Cases If you are in distributed mode, connected to a remote server (available with SPSS Server), you can sort the data before reading it into SPSS.
PAGE 62

38 Chapter 3 Figure 3-13 Sort Cases dialog box You can also sort data after reading it into SPSS, but presorting may save time for large data sources. Results The Results dialog box displays the SQL Select statement for your query. You can edit the SQL Select statement before you run the query, but if you click the Back button to make changes in previous steps, the changes to the Select statement will be lost.
PAGE 63

39 Data Files You can save the query for future use with Save query to a file. Select Paste it into the syntax editor for further modification to paste complete GET DATA syntax into a syntax window. Copying and pasting the Select statement from the Results window will not paste the necessary command syntax. Note: The pasted syntax contains a blank space before the closing quote on each line of SQL generated by the wizard. These blanks are not superfluous.
PAGE 64

40 Chapter 3 Text Wizard The Text Wizard can read text data files formatted in a variety of ways: Tab-delimited files Space-delimited files Comma-delimited files Fixed-field format files For delimited files, you can also specify other characters as delimiters between values, and you can specify multiple delimiters. To Read Text Data Files E From the menus choose: File Read Text Data E Select the text file in the Open dialog box.
PAGE 65

41 Data Files Text Wizard Step 1 Figure 3-15 Text Wizard Step 1 The text file is displayed in a preview window. You can apply a predefined format (previously saved from the Text Wizard) or follow the steps in the Text Wizard to specify how the data should be read.
PAGE 66

42 Chapter 3 Text Wizard Step 2 Figure 3-16 Text Wizard Step 2 This step provides information about variables. A variable is similar to a field in a database. For example, each item in a questionnaire is a variable. How are your variables arranged? To read your data properly, the Text Wizard needs to know how to determine where the data value for one variable ends and the data value for the next variable begins.
PAGE 67

43 Data Files values may appear to run together without even spaces separating them. The column location determines which variable is being read. Are variable names included at the top of your file? If the first row of the data file contains descriptive labels for each variable, you can use these labels as variable names. Values that don’t conform to variable naming rules are converted to valid variable names.
PAGE 68

44 Chapter 3 How are your cases represented? Controls how the Text Wizard determines where each case ends and the next one begins. Each line represents a case. Each line contains only one case. It is fairly common for each case to be contained on a single line (row), even though this can be a very long line for data files with a large number of variables.
PAGE 69

45 Data Files Text Wizard Step 3: Fixed-Width Files Figure 3-18 Text Wizard Step 3 for fixed-width files This step provides information about cases. A case is similar to a record in a database. For example, each respondent to questionnaire is a case. The first case of data begins on which line number? Indicates the first line of the data file that contains data values. If the top line(s) of the data file contain descriptive labels or other text that does not represent data values, this will not be line 1.
PAGE 70

46 Chapter 3 for each case, the percentage of cases selected can only approximate the specified percentage. The more cases there are in the data file, the closer the percentage of cases selected is to the specified percentage. Text Wizard Step 4: Delimited Files Figure 3-19 Text Wizard Step 4 for delimited files This step displays the Text Wizard’s best guess on how to read the data file and allows you to modify how the Text Wizard will read variables from the data file.
PAGE 71

47 Data Files What is the text qualifier? Characters used to enclose values that contain delimiter characters. For example, if a comma is the delimiter, values that contain commas will be read incorrectly unless there is a text qualifier enclosing the value, preventing the commas in the value from being interpreted as delimiters between values. CSV-format data files exported from Excel use a double quotation mark (“) as a text qualifier.
PAGE 72

48 Chapter 3 Insert, move, and delete variable break lines as necessary to separate variables. If multiple lines are used for each case, select each line from the drop-down list and modify the variable break lines as necessary. Note: For computer-generated data files that produce a continuous stream of data values with no intervening spaces or other distinguishing characteristics, it may be difficult to determine where each variable begins.
PAGE 73

49 Data Files Variable name. You can overwrite the default variable names with your own variable names. If you read variable names from the data file, the Text Wizard will automatically modify variable names that don’t conform to variable naming rules. Select a variable in the preview window and then enter a variable name. Data format. Select a variable in the preview window and then select a format from the drop-down list.
PAGE 74

50 Chapter 3 Note: Values that contain invalid characters for the selected format will be treated as missing. Values that contain any of the specified delimiters will be treated as multiple values. Text Wizard Step 6 Figure 3-22 Text Wizard Step 6 This is the final step of the Text Wizard. You can save your specifications in a file for use when importing similar text data files. You can also paste the syntax generated by the Text Wizard into a syntax window.
PAGE 75

51 Data Files File Information A data file contains much more than raw data. It also contains any variable definition information, including: Variable names Variable formats Descriptive variable and value labels This information is stored in the dictionary portion of the data file. The Data Editor provides one way to view the variable definition information. You can also display complete dictionary information for the working data file or any other data file.
PAGE 76

52 Chapter 3 The modified data file is saved, overwriting the previous version of the file. Saving Data Files in Excel Format You can save your data in one of three Microsoft Excel file formats. The choice of format depends on the version of Excel that will be used to open the data. The Excel application cannot open an Excel file from a newer version of the application. For example, Excel 5.0 cannot open an Excel 2000 document. However, Excel 2000 can easily read an Excel 5.0 document.
PAGE 77

53 Data Files Saving Data Files in SAS Format Special handling is given to various aspects of your data when saved as a SAS file. These cases include: Certain characters that are allowed in SPSS variable names are not valid in SAS, such as @, #, and $. These illegal characters are replaced with an underscore when the data are exported. SPSS variable labels containing more than 40 characters are truncated when exported to a SAS v6 file.
PAGE 78

54 Chapter 3 8 = '8 Cylinders' ; value FILTER__ /* cylrec = 1 | cylrec = 2 (FILTER) */ 0 = 'Not Selected' 1 = 'Selected' ; proc datasets library = library ; modify cars; format ORIGIN ORIGIN.; format CYLINDER CYLINDER.; format FILTER__ FILTER__.; quit; This feature is not supported for the SAS transport file.
PAGE 79

55 Data Files Saving Data Files in Other Formats E Make the Data Editor the active window (click anywhere in the window to make it active). E From the menus choose: File Save As... E Select a file type from the drop-down list. E Enter a filename for the new data file. To write variable names to the first row of a spreadsheet or tab-delimited data file: E Click Write variable names to spreadsheet in the Save Data As dialog box.
PAGE 80

56 Chapter 3 When using data files with variable names longer than eight bytes in SPSS 10.x or 11.x, unique, eight-byte versions of variable names are used—but the original variable names are preserved for use in release 12.0 or later. In releases prior to SPSS 10, the original long variable names are lost if you save the data file. When using data files with string variables longer than 255 bytes in versions of SPSS prior to release 13.
PAGE 81

57 Data Files SYLK (*.slk). Symbolic link format for Microsoft Excel and Multiplan spreadsheet files. The maximum number of variables that you can save is 256. dBASE IV (*.dbf). dBASE IV format. dBASE III (*.dbf). dBASE III format. dBASE II (*.dbf). dBASE II format. SAS v6 for Windows (*.sd2). SAS v6 file format for Windows/OS2. SAS v6 for UNIX (*.ssd01). SAS v6 file format for UNIX (Sun, HP, IBM). SAS v6 for Alpha/OSF (*.ssd04). SAS v6 file format for Alpha/OSF (DEC UNIX).
PAGE 82

58 Chapter 3 For data saved as an SPSS data file, the Save Data As Variables dialog box allows you to select the variables that you want to be saved in the new data file. By default, all variables will be saved. Deselect the variables that you don’t want to save, or click Drop All and then select the variables that you want to save. To Save a Subset of Variables E Make the Data Editor the active window (click anywhere in the window to make it active). E From the menus choose: File Save As...
PAGE 83

59 Data Files You can change the file permissions back to read/write by selecting Mark File Read Write from the File menu. Virtual Active File The virtual active file enables you to work with large data files without requiring equally large (or larger) amounts of temporary disk space. For most analysis and charting procedures, the original data source is reread each time you run a different procedure.
PAGE 84

60 Chapter 3 Actions that create one or more columns of data in temporary disk space include: Computing new variables Recoding existing variables Running procedures that create or modify variables (for example, saving predicted values in Linear Regression) Actions that create an entire copy of the data file in temporary disk space include: Reading Excel files Running procedures that sort data (for example, Sort Cases, Split File) Reading data with GET TRANSLATE or DATA LIST commands
PAGE 85

61 Data Files Creating a Data Cache Although the virtual active file can vastly reduce the amount of temporary disk space required, the absence of a temporary copy of the “active” file means that the original data source has to be reread for each procedure. For large data files read from an external source, creating a temporary copy of the data may improve performance.
PAGE 86

62 Chapter 3 which shouldn’t be necessary under most circumstances. Cache Now is useful primarily for two reasons: A data source is “locked” and can’t be updated by anyone until you end your session, open a different data source, or cache the data. For large data sources, scrolling through the contents of the Data View tab in the Data Editor will be much faster if you cache the data.
PAGE 87

Chapter Distributed Analysis Mode 4 Distributed analysis mode allows you to use a computer other than your local (or desktop) computer for memory-intensive work. Since remote servers used for distributed analysis are typically more powerful and faster than your local computer, appropriate use of distributed analysis mode can significantly reduce computer processing time.
PAGE 88

64 Chapter 4 Ratio of computation to output. Commands that perform a lot of computation and produce small output results (for example, few and small pivot tables, brief text results, or few and simple charts) have the most to gain from running in distributed mode. The degree of improvement depends largely on the computing power of the remote server. Small jobs. Jobs that run quickly in local mode will almost always run slower in distributed mode because of inherent client/server overhead. Charts.
PAGE 89

65 Distributed Analysis Mode Figure 4-1 Server Login dialog box You can add, modify, or delete remote servers in the list. Remote servers usually require a user ID and a password, and a domain name may also be necessary. Contact your system administrator for information about available servers, a user ID and password, domain names, and other connection information. You can select a default server and save the user ID, domain name, and password associated with any server.
PAGE 90

66 Chapter 4 Figure 4-2 Server Login Settings dialog box Contact your system administrator for a list of available servers, port numbers for the servers, and additional connection information. Do not use the Secure Socket Layer unless instructed to do so by your administrator. Server Name. A server “name” can be an alphanumeric name assigned to a computer (for example, hqdev001) or a unique IP address assigned to a computer (for example, 202.123.456.78). Port Number.
PAGE 91

67 Distributed Analysis Mode To Select, Switch, or Add Servers E From the menus choose: File Switch Server... To select a default server: E In the server list, select the box next to the server that you want to use. E Enter the user ID, domain name, and password provided by your administrator. Note: You are automatically connected to the default server when you start a new session. To switch to another server: E Select the server from the list.
PAGE 92

68 Chapter 4 Opening Data Files from a Remote Server Figure 4-3 Open Remote File dialog box In distributed analysis mode, the Open Remote File dialog box replaces the standard Open File dialog box. The list of available files, folders, and drives is dependent on what is available on or from the remote server. The current server name is indicated at the top of the dialog box.
PAGE 93

69 Distributed Analysis Mode E Depending on the type of data file that you want to open, from the menus choose: File Open Data... or File Open Database or File Read Text Data... Saving Data Files from a Remote Server Figure 4-4 Save Remote File dialog box In distributed analysis mode, the Save Remote File dialog box replaces the standard Save File dialog box. The list of available folders and drives is dependent on what is available on or from the remote server.
PAGE 94

70 Chapter 4 analysis mode even if they are in shared folders. Permissions for shared folders must include the ability to write to the folder if you want to save data files in a local folder. To Save Data Files from a Remote Server E Make the Data Editor the active window. E From the menus choose: File Save (or Save As...
PAGE 95

71 Distributed Analysis Mode Figure 4-5 Local and remote views Local View Remote View In distributed analysis mode, you will not have access to data files on your local computer unless you specify the drive as a shared device or the folders containing your data files as shared folders.
PAGE 96

72 Chapter 4 If you’re not sure if you’re using local analysis mode or distributed analysis mode, look at the title bar in the dialog box for accessing data files. If the title of the dialog box contains the word Remote (as in Open Remote File), or if the text Remote Server: [server name] appears at the top of the dialog box, you’re using distributed analysis mode. Note: This affects only dialog boxes for accessing data files (for example, Open Data, Save Data, Open Database, and Apply Data Dictionary).
PAGE 97

73 Distributed Analysis Mode Using UNC Path Specifications With the Windows NT server version of SPSS, relative path specifications for data files are relative to the current server in distributed analysis mode, not relative to your local computer. In practical terms, this means that a path specification such as c:\mydocs\mydata.sav does not point to a directory and file on your C drive; it points to a directory and file on the remote server’s hard drive.
PAGE 98
PAGE 99

Chapter 5 Data Editor The Data Editor provides a convenient, spreadsheet-like method for creating and editing data files. The Data Editor window opens automatically when you start a session. The Data Editor provides two views of your data: Data view. Displays the actual data values or defined value labels. Variable view.
PAGE 100

76 Chapter 5 Many of the features of the Data view are similar to those found in spreadsheet applications. There are, however, several important distinctions: Rows are cases. Each row represents a case or an observation. For example, each individual respondent to a questionnaire is a case. Columns are variables. Each column represents a variable or characteristic being measured. For example, each item on a questionnaire is a variable. Cells contain values.
PAGE 101

77 Data Editor The Variable view contains descriptions of the attributes of each variable in the data file. In the Variable view: Rows are variables. Columns are variable attributes.
PAGE 102

78 Chapter 5 E Double-click a variable name at the top of the column in the Data view, or click the Variable View tab. E To define new variables, enter a variable name in any blank row. E Select the attribute(s) that you want to define or modify. Variable Names The following rules apply to variable names: The name must begin with a letter. The remaining characters can be any letter, any digit, a period, or the symbols @, #, _, or $. Variable names cannot end with a period.
PAGE 103

79 Data Editor Variable Measurement Level You can specify the level of measurement as scale (numeric data on an interval or ratio scale), ordinal, or nominal. Nominal and ordinal data can be either string (alphanumeric) or numeric. Measurement specification is relevant only for: Custom Tables procedure and chart procedures that identify variables as scale or categorical. Nominal and ordinal are both treated as categorical. (Custom Tables is available only in the Tables add-on component.
PAGE 104

80 Chapter 5 Figure 5-3 Scale and categorical variables in a chart procedure For SPSS-format data files created in earlier versions of SPSS products, the following rules apply: String (alphanumeric) variables are set to nominal. String and numeric variables with defined value labels are set to ordinal. Numeric variables without defined value labels but less than a specified number of unique values are set to ordinal.
PAGE 105

81 Data Editor Variable Type Variable Type specifies the data type for each variable. By default, all new variables are assumed to be numeric. You can use Variable Type to change the data type. The contents of the Variable Type dialog box depend on the data type selected. For some data types, there are text boxes for width and number of decimals; for others, you can simply select a format from a scrollable list of examples.
PAGE 106

82 Chapter 5 by E or D with an optional sign, or by the sign alone—for example, 123, 1.23E2, 1.23D2, 1.23E+2, and even 1.23+2. Date. A numeric variable whose values are displayed in one of several calendar-date or clock-time formats. Select a format from the list. You can enter dates with slashes, hyphens, periods, commas, or blank spaces as delimiters. The century range for two-digit year values is determined by your Options settings (from the Edit menu, choose Options and click the Data tab).
PAGE 107

83 Data Editor the general format dd-mmm-yy are displayed with dashes as delimiters and three-letter abbreviations for the month. Dates of the general format dd/mm/yy and mm/dd/yy are displayed with slashes for delimiters and numbers for the month. Internally, dates are stored as the number of seconds from October 14, 1582. The century range for dates with two-digit years is determined by your Options settings (from the Edit menu, choose Options and click the Data tab).
PAGE 108

84 Chapter 5 Figure 5-5 Value Labels dialog box To Specify Value Labels E Click the button in the Values cell for the variable that you want to define. E For each value, enter the value and a label. E Click Add to enter the value label.
PAGE 109

85 Data Editor The \n is not displayed in pivot tables or charts; it is interpreted as a line break character. Missing Values Missing Values defines specified data values as user-missing. It is often useful to know why information is missing. For example, you might want to distinguish between data missing because a respondent refused to answer and data missing because the question didn’t apply to that respondent.
PAGE 110

86 Chapter 5 E Enter the values or range of values that represent missing data. All string values, including null or blank values, are considered to be valid unless you explicitly define them as missing. To define null or blank values as missing for a string variable, enter a single space in one of the fields under the Discrete missing values selection. Column Width You can specify a number of characters for the column width.
PAGE 111

87 Data Editor Applying Variable Definition Attributes to Other Variables To apply individual attributes from a defined variable: E In the Variable view, select the attribute cell that you want to apply to other variables. E From the menus choose: Edit Copy E Select the attribute cell(s) to which you want to apply the attribute. You can select multiple target variables.
PAGE 112

88 Chapter 5 E From the menus choose: Edit Copy E Click the empty row number beneath the last defined variable in the data file. E From the menus choose: Edit Paste Variables... E Enter the number of variables that you want to create. E Enter a prefix and starting number for the new variables. The new variable names will consist of the specified prefix plus a sequential number starting with the specified number. Entering Data You can enter data directly in the Data Editor in the Data view.
PAGE 113

89 Data Editor Figure 5-7 Working data file in the Data view To Enter Numeric Data E Select a cell in the Data view. E Enter the data value. The value is displayed in the cell editor at the top of the Data Editor. E Press Enter or select another cell to record the value. To Enter Non-Numeric Data E Double-click a variable name at the top of the column in the Data view or click the Variable View tab. E Click the button in the Type cell for the variable.
PAGE 114

90 Chapter 5 E Double-click the row number or click the Data View tab. E Enter the data in the column for the newly defined variable. To Use Value Labels for Data Entry E If value labels aren’t currently displayed in the Data view, from the menus choose: View Value Labels E Click on the cell in which you want to enter the value. E Select a value label from the drop-down list. The value is entered and the value label is displayed in the cell.
PAGE 115

91 Data Editor Cut, copy, and paste data values. Add and delete cases. Add and delete variables. Change the order of variables. Replacing or Modifying Data Values To delete the old value and enter a new value: E In the Data view, double-click the cell. The cell value is displayed in the cell editor. E Edit the value directly in the cell or in the cell editor. E Press Enter (or move to another cell) to record the new value.
PAGE 116

92 Chapter 5 a dollar format variable, the displayed dollar sign becomes part of the string value. Values that exceed the defined string variable width are truncated. String into Numeric or Date. String values that contain acceptable characters for the numeric or date format of the target cell are converted to the equivalent numeric or date value.
PAGE 117

93 Data Editor Inserting New Variables Entering data in an empty column in the Data view or in an empty row in the Variable view automatically creates a new variable with a default variable name (the prefix var and a sequential number) and a default data format type (numeric). The Data Editor inserts the system-missing value for all cases for the new variable.
PAGE 118

94 Chapter 5 To Change Data Type You can change the data type for a variable at any time using the Variable Type dialog box in the Variable view, and the Data Editor will attempt to convert existing values to the new type. If no conversion is possible, the system-missing value is assigned. The conversion rules are the same as those for pasting data values to a variable with a different format type.
PAGE 119

95 Data Editor Case Selection Status in the Data Editor If you have selected a subset of cases but have not discarded unselected cases, unselected cases are marked in the Data Editor with a diagonal line (slash) through the row number. Figure 5-9 Filtered cases in the Data Editor Filtered (excluded) cases Data Editor Display Options The View menu provides several display options for the Data Editor: Fonts. Controls the font characteristics of the data display. Grid Lines.
PAGE 120

96 Chapter 5 Figure 5-10 Data view pane splitters Pane splitters You can also use the Window menu to insert and remove pane splitters. To insert splitters: E In Data view, from the menus choose: Window Split Splitters are inserted above and to the left of the selected cell. If the top left cell is selected, splitters are inserted to divide the current view approximately in half both horizontally and vertically.
PAGE 121

97 Data Editor The information in the currently displayed view is printed. In the Data view, the data are printed. In the Variable view, data definition information is printed. Grid lines are printed if they are currently displayed in the selected view. Value labels are printed in the Data view if they are currently displayed. Otherwise, the actual data values are printed.
PAGE 122
PAGE 123

Chapter 6 Data Preparation Once you’ve opened a data file or entered data in the Data Editor, you can start creating reports, charts, and analyses without any additional preliminary work. However, there are some additional data preparation features that you may find useful, including: Assign variable properties that describe the data and determine how certain values should be treated.
PAGE 124

100 Chapter 6 All of these variable properties (and others) can be assigned in the Variable view of the Data Editor. There are also several utilities that can assist you in this process: Define Variable Properties can help you define descriptive value labels and missing values. This is particularly useful for categorical data with numeric codes used for category values.
PAGE 125

101 Data Preparation Figure 6-1 Initial dialog box for selecting variables to define E Select the numeric or short string variables for which you want to create value labels or define or change other variable properties, such as missing values or descriptive variable labels. Note: Long string variables (string variables with a defined width of more than eight characters) are not displayed in the variable list. Long string variables cannot have defined value labels or missing values categories.
PAGE 126

102 Chapter 6 E Select a variable for which you want to create value labels or define or change other variable properties. E Enter the label text for any unlabeled values that are displayed in the Value Label grid. E If there are values for which you want to create value labels but those values are not displayed, you can enter values in the Value column below the last scanned value. E Repeat this process for each listed variable for which you want to create value labels.
PAGE 127

103 Data Preparation To sort the variable list to display all variables with unlabeled values at the top of the list: E Click the Unlabeled column heading under Scanned Variable List. You can also sort by variable name or measurement level by clicking the corresponding column heading under Scanned Variable List. Value Label Grid Label. Displays any value labels that have already been defined. You can add or change labels in this column. Value. Unique values for each selected variable.
PAGE 128

104 Chapter 6 level. So, many variables that are in fact categorical may initially be displayed as scale. If you are unsure of what measurement level to assign to a variable, click Suggest. Copy Properties. You can copy value labels and other variable properties from another variable to the currently selected variable or from the currently selected variable to one or more other variables. Unlabeled Values. To create labels for unlabeled values automatically, click Automatic Labels.
PAGE 129

105 Data Preparation Assigning the Measurement Level When you click Suggest for the measurement level in the Define Variable Properties main dialog box, the current variable is evaluated based on the scanned cases and defined value labels, and a measurement level is suggested in the Suggest Measurement Level dialog box that opens. The Explanation area provides a brief description of the criteria used to provide the suggested measurement level.
PAGE 130

106 Chapter 6 Copying Variable Properties The Apply Labels and Level dialog box is displayed when you click From Another Variable or To Other Variables in the Define Variable Properties main dialog box. It displays all of the scanned variables that match the current variable’s type (numeric or string). For string variables, the defined width must also match.
PAGE 131

107 Data Preparation The measurement level for the target variable(s) is always replaced. If either the source or target variable has a defined range of missing values, missing values definitions are not copied. Copying Data Properties The Copy Data Properties Wizard provides the ability to use an external SPSS data file as a template for defining file and variable properties in the working data file.
PAGE 132

108 Chapter 6 Note: Copy Data Properties replaces Apply Data Dictionary, formerly available on the File menu. To Copy Data Properties E From the menus in the Data Editor window choose: Data Copy Data Properties... Figure 6-5 Copy Data Properties Wizard: Step 1 E Select the data file with the file and/or variable properties that you want to copy. This can be an external SPSS-format data file or the working data file.
PAGE 133

109 Data Preparation E Follow the step-by-step instructions in the Copy Data Properties Wizard. Selecting Source and Target Variables In this step, you can specify the source variables containing the variable properties that you want to copy and the target variables that will receive those variable properties. Figure 6-6 Copy Data Properties Wizard: Step 2 Apply properties from selected source file variables to matching working file variables.
PAGE 134

110 Chapter 6 and type (string or numeric) are the same. For string variables, the defined length must also be the same. By default, only matching variables are displayed in the two variable lists. Create matching variables in the working data file if they do not already exist. This updates the source list to display all variables in the source data file.
PAGE 135

111 Data Preparation Figure 6-7 Copy Data Properties Wizard: Step 3 Value Labels. Value labels are descriptive labels associated with data values. Value labels are often used when numeric data values are used to represent non-numeric categories (for example, codes of 1 and 2 for Male and Female). You can replace or merge value labels in the target variables. Replace deletes any defined value labels for the target variable and replaces them with the defined value labels from the source variable.
PAGE 136

112 Chapter 6 Missing Values. Missing values are values identified as representing missing data (for example, 98 for Do not know and 99 for Not applicable). Typically, these values also have defined value labels that describe what the missing value codes stand for. Any existing defined missing values for the target variable are deleted and replaced with the defined missing values from the source variable. Variable Label.
PAGE 137

113 Data Preparation Figure 6-8 Copy Data Properties Wizard: Step 4 Multiple Response Sets. Applies multiple response set definitions from the source data file to the working data file. (Note: Multiple response sets are currently used only by the Tables add-on component.
PAGE 138

114 Chapter 6 Replace deletes all multiple response sets in the working data file and replaces them with the multiple response sets from the source data file. Merge adds multiple response sets from the source data file to the collection of multiple response sets in the working data file. If a set with the same name exists in both files, the existing set in the working data file is unchanged. Variable Sets. Variable sets are used to control the list of variables that are displayed in dialog boxes.
PAGE 139

115 Data Preparation Results Figure 6-9 Copy Data Properties Wizard: Step 5 The last step in the Copy Data Properties Wizard provides information on the number of variables for which variable properties will be copied from the source data file, the number of new variables that will be created, and the number of dataset (file) properties that will be copied. You can also choose to paste the generated command syntax into a syntax window and save the syntax for later use.
PAGE 140

116 Chapter 6 Identifying Duplicate Cases “Duplicate” cases may occur in your data for many reasons, including: Data entry errors in which the same case is accidentally entered more than once. Multiple cases share a common primary ID value but have different secondary ID values, such as family members who all live in the same house.
PAGE 141

117 Data Preparation Figure 6-10 Identify Duplicate Cases dialog box Define matching cases by. Cases are considered duplicates if their values match for all selected variables. If you want to identify only cases that are a 100% match in all respects, select all of the variables. Sort within matching groups by. Cases are automatically sorted by the variables that define matching cases. You can select additional sorting variables that will determine the sequential order of cases in each matching group.
PAGE 142

118 Chapter 6 Use the up and down arrow buttons to the right of the list to change the sort order of the variables. The sort order determines the “first” and “last” case within each matching group, which determines the value of the optional primary indicator variable. For example, if you want to filter out all but the most recent case in each matching group, you could sort cases within the group in ascending order of a date variable, which would make the most recent date the last date in the group.
PAGE 143

119 Data Preparation Missing Values. For numeric variables, the system-missing value is treated like any other value—cases with the system-missing value for an identifier variable are treated as having matching values for that variable. For string variables, cases with no value for an identifier variable are treated as having matching values for that variable.
PAGE 144

120 Chapter 6 Figure 6-11 Initial dialog box for selecting variables to band Optionally, you can limit the number of cases to scan. For data files with a large number of cases, limiting the number of cases scanned can save time, but you should avoid this if possible because it will affect the distribution of values used in subsequent calculations in the Visual Bander. Note: String variables and nominal numeric variables are not displayed in the source variable list.
PAGE 145

121 Data Preparation E Select the numeric scale and/or ordinal variables for which you want to create new categorical (banded) variables. E Select a variable in the Scanned Variable List. E Enter a name for the new banded variable. Variable names must be unique and must follow SPSS variable naming rules. For more information, see “Variable Names” in Chapter 5 on p. 78. E Define the banding criteria for the new variable. For more information, see “Banding Variables” on p. 121. E Click OK.
PAGE 146

122 Chapter 6 The Visual Bander main dialog box provides the following information for the scanned variables: Scanned Variable List. Displays the variables you selected in the initial dialog box. You can sort the list by measurement level (scale or ordinal) or by variable label or name by clicking on the column headings. Cases Scanned. Indicates the number of cases scanned.
PAGE 147

123 Data Preparation You can click and drag the cutpoint lines to different locations on the histogram, changing the band ranges. You can remove bands by dragging cutpoint lines off the histogram. Note: The histogram (displaying nonmissing values), the minimum, and the maximum are based on the scanned values. If you do not include all cases in the scan, the true distribution may not be accurately reflected, particularly if the data file has been sorted by the selected variable.
PAGE 148

124 Chapter 6 Upper Endpoints. Controls treatment of upper endpoint values entered in the Value column of the grid. Included (<=). Cases with the value specified in the Value cell are included in the banded category. For example, if you specify values of 25, 50, and 75, cases with a value of exactly 25 will go in the first band, since this will include all cases with values less than or equal to 25. Excluded (<).
PAGE 149

125 Data Preparation E Click Make Cutpoints. E Select the criteria for generating cutpoints that will define the banded categories. E Click Apply. Figure 6-13 Make Cutpoints dialog box Note: The Make Cutpoints dialog box is not available if you scanned zero cases. Equal Width Intervals. Generates banded categories of equal width (for example, 1–10, 11–20, 21–30, etc.), based on any two of the following three criteria: First Cutpoint Location.
PAGE 150

126 Chapter 6 Equal Percentiles Based on Scanned Cases. Generates banded categories with an equal number of cases in each band (using the aempirical algorithm for percentiles), based on either of the following criteria: Number of Cutpoints. The number of banded categories is the number of cutpoints plus one. For example, three cutpoints generate four percentile bands (quartiles), each containing 25% of the cases. Width (%).
PAGE 151

127 Data Preparation you may find that the first three bands each contain only about 3.3% of the cases, and the last band contains 90% of the cases. Copying Banded Categories When creating banded categories for one or more variables, you can copy the banding specifications from another variable to the currently selected variable or from the selected variable to multiple other variables.
PAGE 152

128 Chapter 6 E Click Copy. or E Select (click) a variable in the Scanned Variable List to which you want to copy defined banded categories. E Click From Another Variable. E Select the variable with the defined banded categories that you want to copy. E Click Copy. If you have specified value labels for the variable from which you are copying the banding specifications, those are also copied.
PAGE 153

Chapter Data Transformations 7 In an ideal situation, your raw data are perfectly suitable for the type of analysis you want to perform, and any relationships between variables are either conveniently linear or neatly orthogonal. Unfortunately, this is rarely the case. Preliminary analysis may reveal inconvenient coding schemes or coding errors, or data transformations may be required in order to expose the true relationship between variables.
PAGE 154

130 Chapter 7 Figure 7-1 Compute Variable dialog box To Compute Variables E From the menus choose: Transform Compute... E Type the name of a single target variable. It can be an existing variable or a new variable to be added to the working data file. E To build an expression, either paste components into the Expression field or type directly in the Expression field.
PAGE 155

131 Data Transformations indicated by question marks (only applies to functions). The function group labeled All provides a listing of all available functions and system variables. A brief description of the currently selected function or variable is displayed in a reserved area in the dialog box. String constants must be enclosed in quotation marks or apostrophes. If values contain decimals, a period (.) must be used as the decimal indicator.
PAGE 156

132 Chapter 7 If the result of a conditional expression is false or missing, the case is not included in the selected subset. Most conditional expressions use one or more of the six relational operators (<, >, <=, >=, =, and ~=) on the calculator pad. Conditional expressions can include variable names, constants, arithmetic operators, numeric and other functions, logical variables, and relational operators. Compute Variable: Type and Label By default, new computed variables are numeric.
PAGE 157

133 Data Transformations Date and time functions Distribution functions Random variable functions Missing value functions Scoring functions (SPSS Server only) Search for functions in the online Help system index for a complete list of functions. Missing Values in Functions Functions and simple arithmetic expressions treat missing values in different ways. In the expression: (var1+var2+var3)/3 the result is missing if a case has a missing value for any of the three variables.
PAGE 158

134 Chapter 7 Active Generator. Two different random number generators are available: SPSS 12 Compatible. The random number generator used in SPSS 12 and previous releases. If you need to reproduce randomized results generated in previous releases based on a specified seed value, use this random number generator. Mersenne Twister. A newer random number generator that is more reliable for simulation purposes.
PAGE 159

135 Data Transformations Count Occurrences of Values within Cases This dialog box creates a variable that counts the occurrences of the same value(s) in a list of variables for each case. For example, a survey might contain a list of magazines with yes/no check boxes to indicate which magazines each respondent reads. You could count the number of yes responses for each respondent to create a new variable that contains the total number of magazines read.
PAGE 160

136 Chapter 7 Count Values within Cases: Values to Count The value of the target variable (on the main dialog box) is incremented by 1 each time one of the selected variables matches a specification in the Values to Count list here. If a case matches several specifications for any variable, the target variable is incremented several times for that variable. Value specifications can include individual values, missing or system-missing values, and ranges.
PAGE 161

137 Data Transformations Figure 7-7 Count Occurrences If Cases dialog box For general considerations on using an If Cases dialog box, see “Compute Variable: If Cases” on p. 131. Recoding Values You can modify data values by recoding them. This is particularly useful for collapsing or combining categories. You can recode the values within existing variables, or you can create new variables based on the recoded values of existing variables.
PAGE 162

138 Chapter 7 Figure 7-8 Recode into Same Variables dialog box To Recode Values of a Variable E From the menus choose: Transform Recode Into Same Variables... E Select the variables you want to recode. If you select multiple variables, they must be the same type (numeric or string). E Click Old and New Values and specify how to recode values. Optionally, you can define a subset of cases to recode. The If Cases dialog box for doing this is the same as the one described for Count Occurrences.
PAGE 163

139 Data Transformations Old Value. The value(s) to be recoded. You can recode single values, ranges of values, and missing values. System-missing values and ranges cannot be selected for string variables because neither concept applies to string variables. Ranges include their endpoints and any user-missing values that fall within the range. Value. Individual old value to be recoded into a new value. The value must be the same data type (numeric or string) as the variable(s) being recoded.
PAGE 164

140 Chapter 7 Figure 7-9 Old and New Values dialog box Recode into Different Variables The Recode into Different Variables dialog box allows you to reassign the values of existing variables or collapse ranges of existing values into new values for a new variable. For example, you could collapse salaries into a new variable containing salary-range categories. You can recode numeric and string variables. You can recode numeric variables into string variables and vice versa.
PAGE 165

141 Data Transformations Figure 7-10 Recode into Different Variables dialog box To Recode Values of a Variable into a New Variable E From the menus choose: Transform Recode Into Different Variables... E Select the variables you want to recode. If you select multiple variables, they must be the same type (numeric or string). E Enter an output (new) variable name for each new variable and click Change. E Click Old and New Values and specify how to recode values.
PAGE 166

142 Chapter 7 Old Value. The value(s) to be recoded. You can recode single values, ranges of values, and missing values. System-missing values and ranges cannot be selected for string variables because neither concept applies to string variables. Old values must be the same data type (numeric or string) as the original variable. Ranges include their endpoints and any user-missing values that fall within the range. Value. Individual old value to be recoded into a new value.
PAGE 167

143 Data Transformations Convert numeric strings to numbers. Converts string values containing numbers to numeric values. Strings containing anything other than numbers and an optional sign (+ or -) are assigned the system-missing value. Old–>New. The list of specifications that will be used to recode the variable(s). You can add, change, and remove specifications from the list.
PAGE 168

144 Chapter 7 Optionally, you can: Rank cases in ascending or descending order. Organize rankings into subgroups by selecting one or more grouping variables for the By list. Ranks are computed within each group. Groups are defined by the combination of values of the grouping variables. For example, if you select gender and minority as grouping variables, ranks are computed for each combination of gender and minority.
PAGE 169

145 Data Transformations Rank Cases: Types You can select multiple ranking methods. A separate ranking variable is created for each method. Ranking methods include simple ranks, Savage scores, fractional ranks, and percentiles. You can also create rankings based on proportion estimates and normal scores. Rank. Simple rank. The value of the new variable equals its rank. Savage score. The new variable contains Savage scores based on an exponential distribution. Fractional rank.
PAGE 170

146 Chapter 7 Rankit. Uses the formula (r-1/2) / w, where w is the number of observations and r is the rank, ranging from 1 to w. Van der Waerden. Van der Waerden's transformation, defined by the formula r/(w+1), where w is the sum of the case weights and r is the rank, ranging from 1 to w. Figure 7-13 Rank Cases Types dialog box Rank Cases: Ties This dialog box controls the method for assigning rankings to cases with the same value on the original variable.
PAGE 171

147 Data Transformations The following table shows how the different methods assign ranks to tied values: Automatic Recode The Automatic Recode dialog box allows you to convert string and numeric values into consecutive integers. When category codes are not sequential, the resulting empty cells reduce performance and increase memory requirements for many procedures. Additionally, some procedures cannot use string variables, and some require consecutive integer values for factor levels.
PAGE 172

148 Chapter 7 The new variable(s) created by Automatic Recode retain any defined variable and value labels from the old variable. For any values without a defined value label, the original value is used as the label for the recoded value. A table displays the old and new values and value labels. String values are recoded in alphabetical order, with uppercase letters preceding their lowercase counterparts.
PAGE 173

149 Data Transformations codes encountered in the data are autorecoded into values higher that the last value in the template, preserving the original autorecode scheme of the original product codes. Save Template. Saves the autorecode scheme for the selected variables in an external template file. The template contains information that maps the original non-missing values to the recoded values. Only information for non-missing values is saved in the template.
PAGE 174

150 Chapter 7 To Recode String or Numeric Values into Consecutive Integers E From the menus choose: Transform Automatic Recode... E Select one or more variables to recode. E For each selected variable, enter a name for the new variable and click New Name. Date and Time Wizard The Date and Time Wizard simplifies a number of common tasks associated with date and time variables. To Use the Date and Time Wizard E From the menus choose: Transform Date/Time...
PAGE 175

151 Data Transformations Figure 7-16 Date and Time Wizard introduction screen Learn how dates and times are represented in SPSS. This choice leads to a screen that provides a brief overview of date/time variables in SPSS. By clicking on the Help button, it also provides a link to more detailed information. Create a date/time variable from a string containing a date or time. Use this option to create a date/time variable from a string variable.
PAGE 176

152 Chapter 7 Extract a part of a date or time variable. This option allows you to extract part of a date/time variable, such as the day of the month from a date/time variable, which has the form mm/dd/yyyy. Assign periodicity to a data set. This choice takes you to the Define Dates dialog box, used to create date/time variables that consist of a set of sequential dates. This feature is typically used to associate dates with time series data.
PAGE 177

153 Data Transformations Dashes, periods, commas, slashes, or blanks can be used as delimiters in day-month-year formats. Months can be represented in digits, Roman numerals, or three-character abbreviations, and they can be fully spelled out. Three-letter abbreviations and fully spelled-out month names must be in English; month names in other languages are not recognized. Duration Variables. Duration variables have a format representing a time duration, such as hh:mm.
PAGE 178

154 Chapter 7 Select String Variable to Convert to Date/Time Variable Figure 7-17 Create date/time variable from string variable, step 1 E Select the string variable to convert in the Variables list. Note that the list displays only string variables. E Select the pattern from the Patterns list that matches how dates are represented by the string variable. The Sample Values list displays actual values of the selected variable in the data file.
PAGE 179

155 Data Transformations Specify Result of Converting String Variable to Date/Time Variable Figure 7-18 Create date/time variable from string variable, step 2 E Enter a name for the Result Variable. This cannot be the name of an existing variable. Optionally, you can: Select a date/time format for the new variable from the Output Format list. Assign a descriptive variable label to the new variable.
PAGE 180

156 Chapter 7 Select Variables to Merge into Single Date/Time Variable Figure 7-19 Create date/time variable from variable set, step 1 E Select the variables that represent the different parts of the date/time. Some combinations of selections are not allowed. For instance, creating a date/time variable from Year and Day of Month is invalid because once Year is chosen, a full date is required.
PAGE 181

157 Data Transformations Specify Date/Time Variable Created by Merging Variables Figure 7-20 Create date/time variable from variable set, step 2 E Enter a name for the Result Variable. This cannot be the name of an existing variable. E Select a date/time format from the Output Format list. Optionally, you can: Assign a descriptive variable label to the new variable.
PAGE 182

158 Chapter 7 Select Type of Calculation to Perform with Date/Time Variables Figure 7-21 Add or subtract values from date/time variables, step 1 Add or subtract a duration from a date. Use this option to add to or subtract from a date-format variable. You can add or subtract durations that are fixed values like 10 days, or the values from a numeric variable—for example, a variable that represents years. Calculate the number of time units between two dates.
PAGE 183

159 Data Transformations Add or Subtract a Duration from a Date To add or subtract a duration from a date-format variable: E Select Add or subtract a duration from a date on the screen of the Date and Time Wizard labeled Do Calculations on Dates. Select Date/Time Variable and Duration to Add or Subtract Figure 7-22 Add or subtract duration, step 2 E Select a date (or time) variable. E Select a duration variable or enter a value for Duration Constant.
PAGE 184

160 Chapter 7 Specify Result of Adding or Subtracting a Duration from a Date/Time Variable Figure 7-23 Add or subtract duration, step 3 E Enter a name for Result Variable. This cannot be the name of an existing variable. Optionally, you can: Assign a descriptive variable label to the new variable.
PAGE 185

161 Data Transformations Select Date-Format Variables to Subtract Figure 7-24 Subtract dates, step 2 E Select the variables to subtract. E Select the unit for the result from the drop-down list.
PAGE 186

162 Chapter 7 Specify Result of Subtracting Two Date-Format Variables Figure 7-25 Subtract dates, step 3 E Enter a name for Result Variable. This cannot be the name of an existing variable. Optionally, you can: Assign a descriptive variable label to the new variable. Subtract Duration Variables To subtract two duration variables: E Select Subtract two durations on the screen of the Date and Time Wizard labeled Do Calculations on Dates.
PAGE 187

163 Data Transformations Select Duration Variables to Subtract Figure 7-26 Subtract durations, step 2 E Select the variables to subtract.
PAGE 188

164 Chapter 7 Specify Result of Subtracting Two Duration Variables Figure 7-27 Subtract durations, step 3 E Enter a name for Result Variable. This cannot be the name of an existing variable. E Select a duration format from the Output Format list. Optionally, you can: Assign a descriptive variable label to the new variable.
PAGE 189

165 Data Transformations Select Component to Extract from Date/Time Variable Figure 7-28 Get part of a date/time variable, step 1 E Select the variable containing the date or time part to extract. E Select the part of the variable to extract, from the drop-down list. You can extract information from dates that is not explicitly part of the display date, such as day of the week.
PAGE 190

166 Chapter 7 Specify Result of Extracting Component from Date/Time Variable Figure 7-29 Get part of a date/time variable, step 2 E Enter a name for Result Variable. This cannot be the name of an existing variable. E If you’re extracting the date or time part of a date/time variable, then you must select a format from the Output Format list. In cases where the output format is not required, the Output Format list will be disabled.
PAGE 191

167 Data Transformations Create new time series variables as functions of existing time series variables. Replace system- and user-missing values with estimates based on one of several methods. A time series is obtained by measuring a variable (or set of variables) regularly over a period of time. Time series data transformations assume a data file structure in which each case (row) represents a set of observations at a different time, and the length of time between cases is uniform.
PAGE 192

168 Chapter 7 SPSS Command Syntax Reference for information on using the DATE command to create custom date variables.) First Case Is. Defines the starting date value, which is assigned to the first case. Sequential values, based on the time interval, are assigned to subsequent cases. Periodicity at higher level. Indicates the repetitive cyclical variation, such as the number of months in a year or the number of days in a week. The value displayed indicates the maximum value you can enter.
PAGE 193

169 Data Transformations Create Time Series The Create Time Series dialog box allows you to create new variables based on functions of existing numeric time series variables. These transformed values are useful in many time series analysis procedures. Default new variable names are the first six characters of the existing variable used to create it, followed by an underscore and a sequential number. For example, for the variable price, the new variable name would be price_1.
PAGE 194

170 Chapter 7 E Select the variable(s) from which you want to create new time series variables. Only numeric variables can be used. Optionally, you can: Enter variable names to override the default new variable names. Change the function for a selected variable. Time Series Transformation Functions Difference. Nonseasonal difference between successive values in the series. The order is the number of previous values used to calculate the difference.
PAGE 195

171 Data Transformations Running median. Median of a span of series values surrounding and including the current value. The span is the number of series values used to compute the median. If the span is even, the median is computed by averaging each pair of uncentered medians. The number of cases with the system-missing value at the beginning and at the end of the series for a span of n is equal to n/2 for even span values and for odd span values.
PAGE 196

172 Chapter 7 If you create new series that contain forecasts beyond the end of the existing series (by clicking a Save button and making suitable choices), the original series and the generated residual series will have missing data for the new observations. Some transformations (for example, the log transformation) produce missing data for certain values of the original series.
PAGE 197

173 Data Transformations To Replace Missing Values for Time Series Variables E From the menus choose: Transform Replace Missing Values... E Select the estimation method you want to use to replace missing values. E Select the variable(s) for which you want to replace missing values. Optionally, you can: Enter variable names to override the default new variable names. Change the estimation method for a selected variable. Estimation Methods for Replacing Missing Values Series mean.
PAGE 198

174 Chapter 7 Scoring Data with Predictive Models The process of applying a predictive model to a set of data is referred to as scoring the data. SPSS, Clementine, and AnswerTree have procedures for building predictive models such as regression, clustering, tree, and neural network models. Once a model has been built, the model specifications can be saved as an XML file containing all of the information necessary to reconstruct the model.
PAGE 199

Chapter File Handling and File Transformations 8 Data files are not always organized in the ideal form for your specific needs. You may want to combine data files, sort the data in a different order, select a subset of cases, or change the unit of analysis by grouping cases together. A wide range of file transformation capabilities is available, including the ability to: Sort data. You can sort cases based on the value of one or more variables. Transpose cases and variables.
PAGE 200

176 Chapter 8 If you select multiple sort variables, cases are sorted by each variable within categories of the preceding variable on the Sort list. For example, if you select gender as the first sorting variable and minority as the second sorting variable, cases will be sorted by minority classification within each gender category. For string variables, uppercase letters precede their lowercase counterparts in sort order. For example, the string value “Yes” comes before “yes” in sort order.
PAGE 201

177 File Handling and File Transformations If the working data file contains an ID or name variable with unique values, you can use it as the name variable, and its values will be used as variable names in the transposed data file. If it is a numeric variable, the variable names start with the letter V, followed by the numeric value. User-missing values are converted to the system-missing value in the transposed data file.
PAGE 202

178 Chapter 8 Figure 8-2 Add Cases dialog box Unpaired Variables. Variables to be excluded from the new, merged data file. Variables from the working data file are identified with an asterisk (*). Variables from the external data file are identified with a plus sign (+). By default, this list contains: Variables from either data file that do not match a variable name in the other file. You can create pairs from unpaired variables and include them in the new, merged file.
PAGE 203

179 File Handling and File Transformations Indicate case source as variable. Indicates the source data file for each case. This variable has a value of 0 for cases from the working data file and a value of 1 for cases from the external data file. To Merge Data Files with the Same Variables and Different Cases E Open one of the data files. The cases from this file will appear first in the new, merged data file. E From the menus choose: Data Merge Files Add Cases...
PAGE 204

180 Chapter 8 Figure 8-3 Selecting pairs of variables with Ctrl-click Add Cases: Rename You can rename variables from either the working data file or the external file before moving them from the unpaired list to the list of variables to be included in the merged data file. Renaming variables enables you to: Use the variable name from the external file rather than the name from the working data file for variable pairs.
PAGE 205

181 File Handling and File Transformations Add Cases: Dictionary Information Any existing dictionary information (variable and value labels, user-missing values, display formats) in the working data file is applied to the merged data file. If any dictionary information for a variable is undefined in the working data file, dictionary information from the external data file is used.
PAGE 206

182 Chapter 8 Figure 8-4 Add Variables dialog box Excluded Variables. Variables to be excluded from the new, merged data file. By default, this list contains any variable names from the external data file that duplicate variable names in the working data file. Variables from the working data file are identified with an asterisk (*). Variables from the external data file are identified with a plus sign (+).
PAGE 207

183 File Handling and File Transformations Both data files must be sorted by ascending order of the key variables, and the order of variables on the Key Variables list must be the same as their sort sequence. Cases that do not match on the key variables are included in the merged file but are not merged with cases from the other file. Unmatched cases contain values for only the variables in the file from which they are taken; variables from the other file contain the system-missing value.
PAGE 208

184 Chapter 8 Add Variables: Rename You can rename variables from either the working data file or the external file before moving them to the list of variables to be included in the merged data file. This is primarily useful if you want to include two variables with the same name that contain different information in the two files.
PAGE 209

185 File Handling and File Transformations Figure 8-5 Aggregate Data dialog box Break Variable(s). Cases are grouped together based on the values of the break variables. Each unique combination of break variable values defines a group. When creating a new, aggregated data file, all break variables are saved in the new file with their existing names and dictionary information. The break variable can be either numeric or string. Aggregate Variable(s).
PAGE 210

186 Chapter 8 To Aggregate a Data File E From the menus choose: Data Aggregate... E Select one or more break variables that define how cases are grouped to create aggregated data. E Select one or more aggregate variables. E Select an aggregate function for each aggregate variable. Saving Aggregated Results You can add aggregate variables to the working data file or create a new, aggregated data file. Add aggregated variables to working data file.
PAGE 211

187 File Handling and File Transformations File is already sorted on break variable(s). If the data have already been sorted by values of the break variables, this option enables the procedure to run more quickly and use less memory. Use this option with caution. Data must by sorted by values of the break variables in the same order as the break variables specified for the Aggregate Data procedure.
PAGE 212

188 Chapter 8 Figure 8-6 Aggregate Function dialog box Aggregate Data: Variable Name and Label Aggregate Data assigns default variable names for the aggregated variables in the new data file. This dialog box enables you to change the variable name for the selected variable on the Aggregate Variables list and provide a descriptive variable label. For more information, see “Variable Names” in Chapter 5 on p. 78.
PAGE 213

189 File Handling and File Transformations Split File Split File splits the data file into separate groups for analysis based on the values of one or more grouping variables. If you select multiple grouping variables, cases are grouped by each variable within categories of the preceding variable on the Groups Based On list.
PAGE 214

190 Chapter 8 Organize output by groups. All results from each procedure are displayed separately for each split-file group. To Split a Data File for Analysis E From the menus choose: Data Split File... E Select Compare groups or Organize output by groups. E Select one or more grouping variables. Select Cases Select Cases provides several methods for selecting a subgroup of cases based on criteria that include variables and complex expressions. You can also select a random sample of cases.
PAGE 215

191 File Handling and File Transformations Figure 8-9 Select Cases dialog box All cases. Turns case filtering off and uses all cases. If condition is satisfied. Use a conditional expression to select cases. If the result of the conditional expression is true, the case is selected. If the result is false or missing, the case is not selected. Random sample of cases. Selects a random sample based on an approximate percentage or an exact number of cases. Based on time or case range.
PAGE 216

192 Chapter 8 Unselected Cases. You can filter or delete cases that do not meet the selection criteria. Filtered cases remain in the data file but are excluded from analysis. Select Cases creates a filter variable, filter_$, to indicate filter status. Selected cases have a value of 1; filtered cases have a value of 0. Filtered cases are also indicated with a slash through the row number in the Data Editor. To turn filtering off and include all cases in your analysis, select All cases.
PAGE 217

193 File Handling and File Transformations Figure 8-10 Select Cases If dialog box If the result of a conditional expression is true, the case is included in the selected subset. If the result of a conditional expression is false or missing, the case is not included in the selected subset. Most conditional expressions use one or more of the six relational operators (<, >, <=, >=, =, and ~=) on the calculator pad.
PAGE 218

194 Chapter 8 Figure 8-11 Select Cases Random Sample dialog box Approximately. Generates a random sample of approximately the specified percentage of cases. Since this routine makes an independent pseudo-random decision for each case, the percentage of cases selected can only approximate the specified percentage. The more cases there are in the data file, the closer the percentage of cases selected is to the specified percentage. Exactly. A user-specified number of cases.
PAGE 219

195 File Handling and File Transformations Figure 8-12 Select Cases Range dialog box for range of cases (no defined date variables) Figure 8-13 Select Cases Range dialog box for time series data with defined date variables Weight Cases Weight Cases gives cases different weights (by simulated replication) for statistical analysis. The values of the weighting variable should indicate the number of observations represented by single cases in your data file.
PAGE 220

196 Chapter 8 Cases with zero, negative, or missing values for the weighting variable are excluded from analysis. Fractional values are valid; they are used exactly where this is meaningful and most likely where cases are tabulated. Figure 8-14 Weight Cases dialog box Once you apply a weight variable, it remains in effect until you select another weight variable or turn off weighting. If you save a weighted data file, weighting information is saved with the data file.
PAGE 221

197 File Handling and File Transformations The values of the frequency variable are used as case weights. For example, a case with a value of 3 for the frequency variable will represent three cases in the weighted data file. Restructuring Data Use the Restructure Data Wizard to restructure your data for the SPSS procedure that you want to use. The wizard replaces the current file with a new, restructured file. The wizard can: Restructure selected variables into cases.
PAGE 222

198 Chapter 8 Figure 8-15 Restructure Data Wizard Restructure selected variables into cases. Choose this when you have groups of related columns in your data and you want them to appear in groups of rows in the new data file. If you choose this, the wizard will display the steps for Variables to Cases. Restructure selected cases into variables. Choose this when you have groups of related rows in your data and you want them to appear in groups of columns in the new data file.
PAGE 223

199 File Handling and File Transformations Deciding How to Restructure the Data A variable contains information that you want to analyze—for example, a measurement or a score. A case is an observation—for example, an individual. In a simple data structure, each variable is a single column in your data and each case is a single row. So, for example, if you were measuring test scores for all students in a class, all score values would appear in only one column, and there would be a row for each student.
PAGE 224

200 Chapter 8 In this example, the first two rows are a case group because they are related. They contain data for the same factor level. In SPSS data analysis, the factor is often referred to as a grouping variable when the data are structured this way. Groups of columns. Does the current file have variables and conditions recorded in the same column? For example: var_1 var_2 8 3 9 1 In this example, the two columns are a variable group because they are related.
PAGE 225

201 File Handling and File Transformations Figure 8-16 Current data for variables to cases You want to do an independent-samples t test. You have a column group consisting of score_a and score_b, but you don’t have the grouping variable that the procedure requires. Select Restructure selected variables into cases in the Restructure Data Wizard, restructure one variable group into a new variable named score, and create an index named group. The new data file is shown in the following figure.
PAGE 226

202 Chapter 8 You want to do a paired-samples t test. Your data structure is case groups, but you don’t have the repeated measures for the paired variables that the procedure requires. Select Restructure selected cases into variables in the Restructure Data Wizard, use id to identify the row groups in the current data, and use time to create the variable group in the new file.
PAGE 227

203 File Handling and File Transformations Figure 8-20 Restructure Data Wizard: Number of Variable Groups One. The wizard will create a single restructured variable in the new file from one variable group in the current file. More than one. The wizard will create multiple restructured variables in the new file. The number that you specify affects the next step, in which the wizard automatically creates the specified number of new variables.
PAGE 228

204 Chapter 8 In this step, provide information about how the variables in the current file should be used in the new file. You can also create a variable that identifies the rows in the new file. Figure 8-21 Restructure Data Wizard: Select Variables How should the new rows be identified? You can create a variable in the new data file that identifies the row in the current data file that was used to create a group of new rows.
PAGE 229

205 File Handling and File Transformations variable in the new file. Use the controls in Variables to be Transposed to define the restructured variable in the new file. To Specify One Restructured Variable E Put the variables that make up the variable group that you want to transform into the Variables to be Transposed list. All of the variables in the group must be of the same type (numeric or string).
PAGE 230

206 Chapter 8 What should be copied into the new file? Variables that aren’t restructured can be copied into the new file. Their values will be propagated in the new rows. Move variables that you want to copy into the new file into the Fixed Variable(s) list. Restructure Data Wizard (Variables to Cases): Create Index Variables Note: The wizard presents this step if you choose to restructure variable groups into rows. In this step, decide whether to create index variables.
PAGE 231

207 File Handling and File Transformations How many index variables should be in the new file? Index variables can be used as grouping variables in SPSS procedures. In most cases, a single index variable is sufficient; however, if the variable groups in your current file reflect multiple factor levels, multiple indices may be appropriate. One. The wizard will create a single index variable. More than one.
PAGE 232

208 Chapter 8 Example of Two Indices for Variables to Cases When a variable group records more than one factor, you can create more than one index; however, the current data must be arranged so that the levels of the first factor are a primary index within which the levels of subsequent factors cycle. In the current data, there is one variable group, width, and two factors, A and B. The data are arranged so that levels of factor B cycle within levels of factor A.
PAGE 233

209 File Handling and File Transformations Figure 8-27 Restructure Data Wizard: Create One Index Variable For more information, see “Example of One Index for Variables to Cases” on p. 207. Sequential numbers. The wizard will automatically assign sequential numbers as index values. Variable names. The wizard will use the names of the selected variable group as index values. Choose a variable group from the list. Names and labels.
PAGE 234

210 Chapter 8 In this step, specify the number of levels for each index variable. You can also specify a name and a label for the new index variable. Figure 8-28 Restructure Data Wizard: Create Multiple Index Variables For more information, see “Example of Two Indices for Variables to Cases” on p. 208. How many levels are recorded in the current file? Consider how many factor levels are recorded in the current data. A level defines a group of cases that experienced identical conditions.
PAGE 235

211 File Handling and File Transformations Total combined levels. You cannot create more levels than exist in the current data. Because the restructured data will contain one row for each combination of treatments, the wizard checks the number of levels that you create. It will compare the product of the levels that you create to the number of variables in your variable groups. They must match. Names and labels.
PAGE 236

212 Chapter 8 Figure 8-29 Restructure Data Wizard: Options Drop unselected variables? In the Select Variables step (step 3), you selected variable groups to be restructured, variables to be copied, and an identification variable from the current data. The data from the selected variables will appear in the new file. If there are other variables in the current data, you can choose to discard or keep them. Keep missing data? The wizard checks each potential new row for null values.
PAGE 237

213 File Handling and File Transformations in the current data. Click a cell to change the default variable name and provide a descriptive variable label for the count variable. Restructure Data Wizard (Cases to Variables): Select Variables Note: The wizard presents this step if you choose to restructure case groups into columns. In this step, provide information about how the variables in the current file should be used in the new file.
PAGE 238

214 Chapter 8 current file identify the case groups so that it can consolidate each group into a single row in the new file. Move variables that identify case groups in the current file into the Identifier Variable(s) list. Variables that are used to split the current data file are automatically used to identify case groups.
PAGE 239

215 File Handling and File Transformations Figure 8-31 Restructure Data Wizard: Sort Data How are the rows ordered in the current file? Consider how the current data are sorted and which variables you are using to identify case groups (specified in the previous step). Yes. The wizard will automatically sort the current data by the identification variables, in the same order that variables are listed in the Identifier Variable(s) list in the previous step.
PAGE 240

216 Chapter 8 Restructure Data Wizard (Cases to Variables): Options Note: The wizard presents this step if you choose to restructure case groups into columns. In this step, specify options for the new, restructured file. Figure 8-32 Restructure Data Wizard: Options How should the new variable groups be ordered in the new file? By variable. The wizard groups the new variables created from an original variable together. By index.
PAGE 241

217 File Handling and File Transformations Example. The variables to be restructured are w and h, and the index is month: w h month Grouping by variable results in: w.jan w.feb h.jan Grouping by index results in: w.jan h.jan w.feb Create a count variable? The wizard can create a count variable in the new file. It contains the number of rows in the current data that were used to create a row in the new data file.
PAGE 242

218 Chapter 8 In this example, the restructured data could be used to get frequency counts of the products that customers buy. Restructure Data Wizard: Finish This is the final step of the Restructure Data Wizard. Decide what to do with your specifications.
PAGE 243

219 File Handling and File Transformations Restructure now. The wizard will create the new, restructured file. Choose this if you want to replace the current file immediately. Note: If original data are weighted, the new data will be weighted unless the variable that is used as the weight is restructured or dropped from the new file. Paste syntax. The wizard will paste the syntax it generates into a syntax window.
PAGE 244
PAGE 245

Chapter Working with Output 9 When you run a procedure, the results are displayed in a window called the Viewer. In this window, you can easily navigate to whichever part of the output you want to see. You can also manipulate the output and create a document that contains precisely the output that you want, arranged and formatted appropriately. Viewer Results are displayed in the Viewer. You can use the Viewer to: Browse results. Show or hide selected tables and charts.
PAGE 246

222 Chapter 9 Figure 9-1 Viewer The Viewer is divided into two panes: The left pane of the Viewer contains an outline view of the contents. The right pane contains statistical tables, charts, and text output. You can use the scroll bars to browse the results, or you can click an item in the outline to go directly to the corresponding table or chart. You can click and drag the right border of the outline pane to change the width of the outline pane.
PAGE 247

223 Working with Output To use the Draft Viewer: E In any window, from the menus choose: Edit Options... E On the General tab, click Draft for the output viewer type. E To change the format options for Draft Viewer output, click the Draft Viewer tab. For more information, see “Draft Viewer” in Chapter 10 on p. 253. You can also search the Help facility to learn more: E In any window, from the menus choose: Help Topics E Click the Index tab in the Help Topics window.
PAGE 248

224 Chapter 9 E From the menus choose: View Hide or E Click the closed book (Hide) icon on the Outlining toolbar. The open book (Show) icon becomes the active icon, indicating that the item is now hidden. Hiding Procedure Results E Click the box to the left of the procedure name in the outline pane. This hides all of the results from the procedure and collapses the outline view.
PAGE 249

225 Working with Output E Press Delete. or E From the menus choose: Edit Delete Copying Output in the Viewer E Click items in the outline or contents pane to select them. (Shift-click to select multiple items, or Ctrl-click to select noncontiguous items.) E Hold down the Ctrl key while you use the mouse to click and drag selected items (hold down the mouse button while dragging). E Release the mouse button to drop the items where you want them.
PAGE 250

226 Chapter 9 Note: All results are displayed left-aligned in the Viewer. Only the alignment of printed results is affected by the alignment settings. Centered and right-aligned items are identified by a small symbol above and to the left of the item. Viewer Outline The outline pane provides a table of contents of the Viewer document. You can use the outline pane to navigate through your results and control the display. Most actions in the outline pane have a corresponding effect on the contents pane.
PAGE 251

227 Working with Output Controlling the outline display. To control the outline display, you can: Expand and collapse the outline view. Change the outline level for selected items. Change the size of items in the outline display. Change the font used in the outline display. Collapsing and Expanding the Outline View E Click the box to the left of the outline item that you want to collapse or expand. or E Click the item in the outline.
PAGE 252

228 Chapter 9 or Edit Outline Demote Changing the outline level is particularly useful after you move items in the outline level. Moving items can change the outline level of the selected items, and you can use the left- and right-arrow buttons on the Outlining toolbar to restore the original outline level. Changing the Size of Outline Items E From the menus choose: View Outline Size Small Other options include Medium and Large. The icons and their associated text change size.
PAGE 253

229 Working with Output E Click the table, chart, or other object that will precede the title or text. E From the menus choose: Insert New Title or Insert New Text E Double-click the new object. E Enter the text that you want at this location. Adding a Text File E In either the outline or the contents pane of the Viewer, click the table, chart, or other object that will precede the text. E From the menus choose: Insert Text File... E Select a text file. To edit the text, double-click it.
PAGE 254

230 Chapter 9 Picture (metafile). You can paste pivot tables, text output, and charts as metafile pictures. The picture format can be resized in the other application, and sometimes a limited amount of editing can be done with the facilities of the other application. Pivot tables pasted as pictures retain all borders and font characteristics. RTF (rich text format). Pivot tables can be pasted into other applications in RTF format.
PAGE 255

231 Working with Output pivot tables in RTF format, which pastes the pivot table as a table. For spreadsheet applications, Paste will paste pivot tables in BIFF format. Charts are pasted as metafiles. Paste Special. Results are copied to the clipboard in multiple formats. Paste Special allows you to select the format that you want from the list of formats available to the target application.
PAGE 256

232 Chapter 9 Pasting a Pivot Table or Chart as a Picture (Metafile) E In the Viewer, copy the table or chart. E From the menus in the target application, choose: Edit Paste Special... E From the list, select Picture. The item is pasted as a metafile. Only the layer and columns that were visible when the item was copied are available in the metafile. Other layers or hidden columns are not available. Pasting a Pivot Table as a Table (RTF) E In the Viewer, copy the pivot table.
PAGE 257

233 Working with Output Copying and Pasting Multiple Items into Another Application E Select the tables and/or charts to be copied. (Shift-click or Ctrl-click to select multiple items.) E From the menus choose: Edit Copy objects E In the target application, from the menus choose: Edit Paste Note: Use Copy Objects only to copy multiple items from the Viewer to another application. For copying and pasting within Viewer documents (for example, between two Viewer windows), use Copy on the Edit menu.
PAGE 258

234 Chapter 9 Figure 9-3 Paste Special dialog box Pasting Objects from Other Applications into the Viewer E Copy the object in the other application. E In either the outline or the contents pane of the Viewer, click the table, chart, or other object that will precede the object. E From the menus choose: Edit Paste Special... E From the list, select the format for the object.
PAGE 259

235 Working with Output For HTML and text formats, charts are exported in the currently selected chart export format. For HTML document format, charts are embedded by reference, and you should export charts in a suitable format for inclusion in HTML documents. For text document format, a line is inserted in the text file for each chart, indicating the filename of the exported chart. For Word/RTF format, charts are exported in Windows metafile format and embedded in the Word document.
PAGE 260

236 Chapter 9 Word/RTF file (*.doc). Pivot tables are exported as Word tables with all formatting attributes intact—for example, cell borders, font styles, background colors, and so on. Text output is exported as formatted RTF. Text output in SPSS is always displayed in a fixed-pitch (monospaced) font and is exported with the same font attributes. A fixed-pitch font is required for proper alignment of space-separated text output. PowerPoint file (*.ppt).
PAGE 261

237 Working with Output Figure 9-4 Export Output dialog box Figure 9-5 Output exported in Word/RTF format HTML, Word/RTF, and Excel Options This dialog box controls the inclusion of footnotes and captions for documents exported in HTML, Word/RTF, and Excel formats, the chart export options for HTML documents, and the handling of multilayer pivot tables.
PAGE 262

238 Chapter 9 Image Format. Controls the chart export format and optional settings, including chart size for HTML documents. For Word/RTF, all charts are exported in Windows metafile (WMF) format. For Excel, charts are not included. Export Footnotes and Caption. Check this box to include any footnotes and captions along with the export of pivot tables. Export all Layers. Check this box to export all layers of a multilayer pivot table. If left unchecked, only the top layer is exported.
PAGE 263

239 Working with Output Include Title on Slide. Check this box to include a title on each slide created by the export. Each slide contains a single item exported from the Viewer. The title is formed from the outline entry for the item in the outline pane of the Viewer. Export Footnotes and Caption. Check this box to include any footnotes and captions along with the export of pivot tables. Export All Layers.
PAGE 264

240 Chapter 9 All text output is exported in space-separated format. All space-separated output requires a fixed-pitch (monospaced) font for proper alignment. Cell Formatting. For space-separated pivot tables, by default all line wrapping is removed and each column is set to the width of the longest label or value in the column. To limit the width of columns and wrap long labels, specify a number of characters for the column width. This setting affects only pivot tables. Cell Separators.
PAGE 265

241 Working with Output E For Charts Only, select the export format, and click Chart Size. JPEG Chart Export Options Color Depth. JPEG charts can be exported as true color (24 bit) or 256 grayscale. Color Space. Color Space refers to the way that colors are encoded in the image. The YUV color model is one form of color encoding, commonly used for digital video and MPEG transmission. The acronym stands for Y-signal, U-Signal, V-signal.
PAGE 266

242 Chapter 9 If the number of colors in the chart exceeds the number of colors for that depth, the colors will be dithered to replicate the colors in the chart. Current screen depth is the number of colors currently displayed on your computer monitor. Color Operations. The following operations are available: Invert. Each pixel is saved as the inverse of the original color. Gamma correction.
PAGE 267

243 Working with Output Transparency. Allows you to select a color that will appear transparent in the exported chart. Available only with 32-bit true color export. Enter integer values between 0 and 255 for each color. The default value for each color is 255, creating a default transparent color of white. Format. (TIFF only.) Allows you to set the color space and compress the exported chart. All color depths are available with RGB color. Only 24- and 32-bit true color is available with CMYK.
PAGE 268

244 Chapter 9 printer that doesn’t support Type 42 fonts and you need to preserve special TrueType symbols, such as the markers used in interactive scatterplots. Other Charts For all other charts, the following EPS options are available: Include TIFF preview. Saves a preview with the EPS image in TIFF format for display in applications that cannot display EPS images on screen. Fonts. Controls the treatment of fonts in EPS images. Replace fonts with curves. Turns fonts into PostScript curve data.
PAGE 269

245 Working with Output Viewer Printing There are two options for printing the contents of the Viewer window: All visible output. Prints only items currently displayed in the contents pane. Hidden items (items with a closed book icon in the outline pane or hidden in collapsed outline layers) are not printed. Selection. Prints only items currently selected in the outline and/or contents panes.
PAGE 270

246 Chapter 9 Print Preview Print Preview shows you what will print on each page for Viewer documents.
PAGE 271

247 Working with Output If any output is currently selected in the Viewer, the preview displays only the selected output. To view a preview for all output, make sure nothing is selected in the Viewer. Viewing a Print Preview E Make the Viewer the active window (click anywhere in the window).
PAGE 272

248 Chapter 9 Figure 9-11 Page Setup dialog box Page Setup settings are saved with the Viewer document. Page Setup affects settings for printing Viewer documents only. These settings have no effect on printing data from the Data Editor or syntax from a syntax window. Changing Page Setup E Make the Viewer the active window (click anywhere in the window). E From the menus choose: File Page Setup... E Change the settings and click OK.
PAGE 273

249 Working with Output Page Setup Options: Headers and Footers Headers and footers are the information that prints at the top and bottom of each page. You can enter any text that you want to use as headers and footers.
PAGE 274

250 Chapter 9 Page titles and subtitles print the current page titles and subtitles. Page titles and subtitles are created with Insert New Page Title on the Viewer Insert menu or the TITLE and SUBTITLE commands in command syntax. If you have not specified any page titles or subtitles, this setting is ignored. Note: Font characteristics for new page titles and subtitles are controlled on the Viewer tab of the Options dialog box (Edit menu).
PAGE 275

251 Working with Output Figure 9-13 Page Setup Options dialog box, Options tab Saving Output The contents of the Viewer can be saved to a Viewer document. The saved document includes both panes of the Viewer window (the outline and the contents). Saving a Viewer Document E From the Viewer window menus choose: File Save E Enter the name of the document and click Save. To save results in external formats (for example, HTML or text), use Export on the File menu.
PAGE 276

252 Chapter 9 Save With Password Option Save With Password allows you to password-protect your Viewer files. Password. The password is case sensitive and can be up to 16 characters long. If you assign a password, the file cannot be viewed without entering the password. OEM Code. Leave this field blank unless you have a contractual agreement with SPSS Inc. to redistribute the SmartViewer. The OEM license code is provided with the contract.
PAGE 277

Chapter 10 Draft Viewer The Draft Viewer provides results in draft form, including: Simple text output (instead of pivot tables) Charts as metafile pictures (instead of chart objects) Text output in the Draft Viewer can be edited, charts can be resized, and both text output and charts can be pasted into other applications. However, charts cannot be edited, and the interactive features of pivot tables and charts are not available.
PAGE 278

254 Chapter 10 Figure 10-1 Draft Viewer window To Create Draft Output E From the menus choose: File New Draft Output E To make draft output the default output type, from the menus choose: Edit Options... E Click the General tab. E Select Draft under Viewer Type at Startup.
PAGE 279

255 Draft Viewer Note: New output is always displayed in the designated Viewer window. If you have both a Viewer and a Draft Viewer window open, the designated window is the one opened most recently or the one designated with the Designate Window tool (the exclamation point) on the toolbar. Controlling Draft Output Format Output that would be displayed as pivot tables in the Viewer is converted to text output for the Draft Viewer.
PAGE 280

256 Chapter 10 Figure 10-2 Draft Viewer Options Column width. To reduce the width of tables that contain long labels, select Maximum characters under Column width. Labels longer than the specified width are wrapped to fit the maximum width.
PAGE 281

257 Draft Viewer Figure 10-3 Draft output before and after setting maximum column width Row and column separators. As an alternative to box characters for row and column borders, you can use the Cell Separators settings to control the row and column separators displayed in new draft output. You can specify different cell separators or enter blank spaces if you don’t want any characters used to mark rows and columns. You must deselect Display Box Character to specify cell separators.
PAGE 282

258 Chapter 10 Figure 10-4 Draft output before and after setting cell separators Space-separated versus tab-separated columns. The Draft Viewer is designed to display space-separated output in a fixed-pitch (monospaced) font. If you want to paste draft output into another application, you must use a fixed-pitch font to align space-separated columns properly. If you select Tabs for the column separator, you can use any font that you want in the other application and set the tabs to align output properly.
PAGE 283

259 Draft Viewer Figure 10-5 Tab-separated output in the Draft Viewer and formatted in a word processor To Set Draft Viewer Options E From the menus choose: Edit Options... E Click the Draft Viewer tab. E Select the settings that you want. E Click OK or Apply.
PAGE 284

260 Chapter 10 Draft Viewer output display options affect only new output produced after you change the settings. Output already displayed in the Draft Viewer is not affected by changes in these settings. Fonts in Draft Output You can modify the font attributes (such as font, size, and style) of text output in the Draft Viewer. However, if you use box characters for row and column borders, proper column alignment for space-separated text requires a fixed-pitch (monospaced) font, such as Courier.
PAGE 285

261 Draft Viewer E From the menus choose: File Print... E Select Selection. Draft Viewer Print Preview Print Preview shows you what will print on each page for draft documents.
PAGE 286

262 Chapter 10 To Save Draft Viewer Output E From the Draft Viewer menus choose: File Save Draft Viewer output is saved in rich text format (RTF). To Save Draft Output as Text E From the Draft Viewer menus choose: File Export... You can export all text or just the selected text. Only text output (converted pivot table output and text output) is saved in the exported files; charts are not included.
PAGE 287

Chapter 11 Pivot Tables Many of the results in the Viewer are presented in tables that can be pivoted interactively. That is, you can rearrange the rows, columns, and layers.
PAGE 288

264 Chapter 11 E From the context menu choose: SPSS Pivot Table Object Open E Repeat for each pivot table you want to edit. Each pivot table is ready to edit in its own separate window. To Pivot a Table Using Icons E Activate the pivot table. E From the Pivot Table menus choose: Pivot Pivoting Trays E Hover over each icon with the mouse pointer for a ToolTip pop-up that tells you which table dimension the icon represents. E Drag an icon from one tray to another.
PAGE 289

265 Pivot Tables To Identify Pivot Table Dimensions E Activate the pivot table. E If pivoting trays are not on, from the Pivot Table menus choose: Pivot Pivoting Trays E Click and hold down the mouse button on an icon. This highlights the dimension labels in the pivot table. To Transpose Rows and Columns E From the Pivot Table menus choose: Pivot Transpose Rows and Columns This has the same effect as dragging all of the row icons into the Column tray and all of the column icons into the Row tray.
PAGE 290

266 Chapter 11 E Click and drag the label to the new position. E From the context menu, choose Insert Before or Swap. Note: Make sure that Drag to Copy on the Edit menu is not enabled (checked). If Drag to Copy is enabled, deselect it. To Group Rows or Columns and Insert Group Labels E Activate the pivot table. E Select the labels for the rows or columns you want to group together (click and drag or Shift-click to select multiple labels).
PAGE 291

267 Pivot Tables E From the menus choose: Edit Ungroup Ungrouping automatically deletes the group label. To Rotate Pivot Table Labels E Activate the pivot table. E From the menus choose: Format Rotate InnerColumn Labels or Rotate OuterRow Labels Cumulative Percent 76.6 82.3 100.0 Valid Percent Clerical Custodial Manager Total Valid Percent 76.6 5.7 17.7 100.0 Percent Percent 76.6 5.7 17.7 100.0 Frequency Clerical Custodial Manager Total Frequency 363 27 84 474 363 27 84 474 76.6 5.7 17.7 100.
PAGE 292

268 Chapter 11 E From the Pivot menu choose Reset Pivots to Defaults. This resets only changes that are the result of pivoting row, column, and layer elements between dimensions. It does not affect changes such as grouping or ungrouping or moving rows and columns. To Find a Definition of a Pivot Table Label You can obtain context-sensitive Help on cell labels in pivot tables. For example, if Mean appears as a label, you can obtain a definition of the mean. E Click the right mouse button on a label cell.
PAGE 293

269 Pivot Tables Figure 11-4 Moving categories into layers Minority pivoted from row to layer dimension Each layer icon has left and right arrows. The visible table is the table for the top layer. Figure 11-5 Categories in separate layers Minority classification: Yes Minority classification: No To Change Layers E Click one of the layer icon arrows. or E Select a category from the drop-down list of layers.
PAGE 294

270 Chapter 11 Figure 11-6 Selecting layers from drop-down lists Go to Layer Category Go to Layer Category allows you to change layers in a pivot table. This dialog box is particularly useful when there are a large number of layers or one layer has many categories. To Go to a Table Layer E From the Pivot Table menus choose: Pivot Go to Layer...
PAGE 295

271 Pivot Tables Figure 11-7 Go to Layer Category dialog box E Select a layer dimension in the Visible Category list. The Categories list will display all categories for the selected dimension. E Select the category you want in the Categories list and click OK. This changes the layer and closes the dialog box. To view another layer without closing the dialog box: E Select the category and click Apply.
PAGE 296

272 Chapter 11 You can also move layers to rows or columns by dragging their icons between the Layer, Row, and Column pivoting trays. Bookmarks Bookmarks allow you to save different views of a pivot table. Bookmarks save: Placement of elements in row, column, and layer dimensions Display order of elements in each dimension Currently displayed layer for each layer element To Bookmark Pivot Table Views E Activate the pivot table. E Pivot the table to the view you want to bookmark.
PAGE 297

273 Pivot Tables E Click Go To. To Rename a Pivot Table Bookmark E Activate the pivot table. E From the menus choose: Pivot Bookmarks E Click the name of the bookmark in the list. E Click Rename. E Enter the new bookmark name. E Click OK.
PAGE 298

274 Chapter 11 E From the context menu choose Hide Category. To Show Hidden Rows and Columns in a Table E Select another label in the same dimension as the hidden row or column. For example, if the Female category of the Gender dimension is hidden, click the Male category. E From the Pivot Table menus choose: View Show All Categories in dimension name For example, choose Show All Categories in Gender. or E From the Pivot Table menus choose: View Show All This displays all hidden cells in the table.
PAGE 299

275 Pivot Tables E From the menus choose: View Hide (or Show) To Hide or Show a Caption or Title in a Table E Select a caption or title. E From the menus choose: View Hide (or Show) Editing Results The appearance and contents of each table or text output item can be edited. You can: Apply a TableLook. Change the properties of the current table. Change the properties of cells in the table. Modify text. Add footnotes and captions to tables. Add items to the Viewer.
PAGE 300

276 Chapter 11 TableLooks A TableLook is a set of properties that define the appearance of a table. You can select a previously defined TableLook or create your own. Before or after a TableLook is applied, you can change cell formats for individual cells or groups of cells, using cell properties. The edited cell formats will remain, even when you apply a new TableLook.
PAGE 301

277 Pivot Tables Figure 11-8 TableLooks dialog box E Select a TableLook from the list of files. To select a file from another directory, click Browse. E Click OK to apply the TableLook to the selected pivot table. To Edit or Create a TableLook E Select a TableLook from the list of files. E Click Edit Look. E Adjust the table properties for the attributes you want and click OK. E Click Save Look to save the edited TableLook or Save As to save it as a new TableLook.
PAGE 302

278 Chapter 11 Table Properties The Table Properties dialog box allows you to set general properties of a table, set cell styles for various parts of a table, and save a set of those properties as a TableLook. Using the tabs on this dialog box, you can: Control general properties, such as hiding empty rows or columns and adjusting printing properties. Control the format and position of footnote markers.
PAGE 303

279 Pivot Tables Table Properties: General Several properties apply to the table as a whole. You can: Show or hide empty rows and columns. (An empty row or column has nothing in any of the data cells.) Control the placement of row labels. They can be in the upper-left corner or nested. Control maximum and minimum column width (expressed in points). Figure 11-9 Table Properties dialog box, General tab To Change General Table Properties E Select the General tab. E Select the options you want.
PAGE 304

280 Chapter 11 Table Properties: Footnotes The properties of footnote markers include style and position in relation to text. The style of footnote markers is either numbers (1, 2, 3, ...) or letters (a, b, c, ...). The footnote markers can be attached to text as superscripts or subscripts. Figure 11-10 Table Properties dialog box, Footnotes tab To Change Footnote Marker Properties E Select the Footnotes tab. E Select a footnote marker format. E Select a marker position. E Click OK or Apply.
PAGE 305

281 Pivot Tables Table Properties: Cell Formats For formatting, a table is divided into areas: title, layers, corner labels, row labels, column labels, data, caption, and footnotes. For each area of a table, you can modify the associated cell formats. Cell formats include text characteristics (such as font, size, color, and style), horizontal and vertical alignment, cell shading, foreground and background colors, and inner cell margins.
PAGE 306

282 Chapter 11 Figure 11-12 Table Properties dialog box, Cell Formats tab To Change Cell Formats E Select the Cell Formats tab. E Select an area from the drop-down list or click an area of the sample. E Select characteristics for the area. Your selections are reflected in the sample. E Click OK or Apply. Table Properties: Borders For each border location in a table, you can select a line style and a color. If you select None as the style, there will be no line at the selected location.
PAGE 307

283 Pivot Tables Figure 11-13 Table Properties dialog box, Borders tab To Change Borders in a Table E Click the Borders tab. E Select a border location, either by clicking its name in the list or by clicking a line in the Sample area. (Shift-click to select multiple names, or Ctrl-click to select noncontiguous names.) E Select a line style or None. E Select a color. E Click OK or Apply.
PAGE 308

284 Chapter 11 To Display Hidden Borders in a Pivot Table For tables without many visible borders, you can display the hidden borders. This can make tasks like changing column widths easier. The hidden borders (gridlines) are displayed in the Viewer but are not printed. E Activate the pivot table (double-click anywhere in the table).
PAGE 309

285 Pivot Tables Font A TableLook allows you to specify font characteristics for different areas of the table. You can also change the font for any individual cell. Options for the font in a cell include the font face, style, size, and color. You can also hide the text or underline it. If you specify font properties in a cell, they apply in all of the table layers that have the same cell.
PAGE 310

286 Chapter 11 Data Cell Widths Set Data Cell Widths is used to set all data cells to the same width. Figure 11-15 Set Data Cell Width dialog box To Change Data Cell Widths E Activate the pivot table. E From the menus choose: Format Set Data Cell Widths... E Enter a value for the cell width. To Change the Width of a Pivot Table Column E Activate the pivot table (double-click anywhere in the table).
PAGE 311

287 Pivot Tables Figure 11-16 Changing the width of a column You can change vertical category and dimension borders in the row labels area, whether or not they are showing. E Move the mouse pointer through the row labels until you see the double-pointed arrow. E Drag it to the new width. Cell Properties Cell Properties are applied to a selected cell. You can change the value format, alignment, margins, and shading.
PAGE 312

288 Chapter 11 Cell Properties: Value This dialog box tab controls the value format for a cell. You can select formats for number, date, time, or currency, and you can adjust the number of decimal digits displayed. Figure 11-17 Cell Properties dialog box, Value tab To Change Value Formats in a Cell E Click the Value tab. E Select a category and a format. E Select the number of decimal places. To Change Value Formats for a Column E Ctrl-Alt-click the column label. E Right-click the highlighted column.
PAGE 313

289 Pivot Tables E From the context menu choose Cell Properties. E Click the Value tab. E Select the format you want to apply to the column. You can use this method to suppress or add percentage signs and dollar signs, change the number of decimals displayed, and switch between scientific notation and regular numeric display. Cell Properties: Alignment This dialog box tab sets horizontal and vertical alignment and text direction for a cell.
PAGE 314

290 Chapter 11 E From the Pivot Table menus choose: Format Cell Properties... E Click the Alignment tab. As you select the alignment properties for the cell, they are illustrated in the Sample area. Cell Properties: Margins This dialog box tab specifies the inset at each edge of a cell. Figure 11-19 Cell Properties dialog box, Margins tab To Change Margins in Cells E Click the Margins tab. E Select the inset for each of the four margins.
PAGE 315

291 Pivot Tables Cell Properties: Shading This dialog box tab specifies the percentage of shading, and foreground and background colors for a selected cell area. This does not change the color of the text. Figure 11-20 Cell Properties dialog box, Shading tab To Change Shading in Cells E Click the Shading tab. E Select the highlights and colors for the cell. Footnote Marker Footnote Marker changes the character(s) used to mark a footnote.
PAGE 316

292 Chapter 11 Figure 11-21 Footnote Marker dialog box To Change Footnote Marker Characters E Select a footnote. E From the Pivot Table menus choose: Format Footnote Marker... E Enter one or two characters. To Renumber Footnotes When you have pivoted a table by switching rows, columns, and layers, the footnotes may be out of order. To renumber the footnotes: E Activate the pivot table.
PAGE 317

293 Pivot Tables To Select a Row or Column in a Pivot Table E Activate the pivot table (double-click anywhere in the table). E Click a row or column label. E From the menus choose: Edit Select Data and Label Cells or E Ctrl-Alt-click the row or column label. If the table contains more than one dimension in the row or column area, the highlighted selection may span multiple noncontiguous cells. Modifying Pivot Table Results Text appears in the Viewer in many items. You can edit the text or add new text.
PAGE 318

294 Chapter 11 To Add Captions to a Table E From the Pivot Table menus choose: Insert Caption The words Table Caption are displayed at the bottom of the table. E Select the words Table Caption and enter your caption text over it. To Add a Footnote to a Table A footnote can be attached to any item in a table. E Click a title, cell, or caption within an activated pivot table. E From the Pivot Table menus choose: Insert Footnote... E Select the word Footnote and enter the footnote text over it.
PAGE 319

295 Pivot Tables E From the menus choose: Format Table Properties... E On the Printing tab, select Print all layers. You can also print each layer of a pivot table on a separate page. Controlling Table Breaks for Wide and Long Tables Pivot tables that are either too wide or too long to print within the defined page size are automatically split and printed in multiple sections. (For wide tables, multiple sections will print on the same page if there is room.
PAGE 320

296 Chapter 11 E From the menus choose: Format Keep Together To Rescale a Pivot Table to Fit the Page Size E Activate the pivot table. E From the menus choose: Format Table Properties E Click the Printing tab. E Click Rescale wide table to fit page. and/or E Click Rescale long table to fit page.
PAGE 321

Chapter 12 Working with Command Syntax SPSS provides a powerful command language that allows you to save and automate many common tasks. It also provides some functionality not found in the menus and dialog boxes. Most commands are accessible from the menus and dialog boxes. However, some commands and options are available only by using the command language.
PAGE 322

298 Chapter 12 Syntax Rules Keep in mind the following simple rules when editing and writing command syntax: Each command must begin on a new line and end with a period (.). Most subcommands are separated by slashes (/). The slash before the first subcommand on a command is usually optional. Variable names must be spelled out fully. Text included within apostrophes or quotation marks must be contained on a single line. Each line of command syntax cannot exceed 80 characters.
PAGE 323

299 Working with Command Syntax INCLUDE Files For command files run via the INCLUDE command, the syntax rules are slightly different: Each command must begin in the first column of a new line. Continuation lines must be indented at least one space. The period at the end of the command is optional.
PAGE 324

300 Chapter 12 Figure 12-1 Command syntax pasted from a dialog box Note: If you open a dialog box from the menus in a script window, code for running syntax from a script is pasted into the script window. Copying Syntax from the Output Log You can build a syntax file by copying command syntax from the log that appears in the Viewer. To use this method, you must select Display commands in the log in the Viewer settings (Edit menu, Options, Viewer tab) before running the analysis.
PAGE 325

301 Working with Command Syntax Figure 12-2 Command syntax in the log To Copy Syntax from the Output Log E Before running the analysis, from the menus choose: Edit Options...
PAGE 326

302 Chapter 12 E On the Viewer tab, select Display commands in the log. As you run analyses, the commands for your dialog box selections are recorded in the log. E Open a previously saved syntax file or create a new one. To create a new syntax file, from the menus choose: File New Syntax E In the Viewer, double-click on a log item to activate it. E Click and drag the mouse to highlight the syntax that you want to copy.
PAGE 327

303 Working with Command Syntax Figure 12-3 Editing the journal file Delete warnings and error messages before saving and running syntax from the journal file To Edit Syntax in a Journal File E To open the journal file, from the menus choose: File Open Other... E Locate and open the journal file (by default, spss.jnl is located in the temp directory). Select All files (*.*) for Files of Type or enter *.jnl in the File Name text box to display journal files in the file list.
PAGE 328

304 Chapter 12 E Click the Run button (the right-pointing triangle) on the Syntax Editor toolbar. or E Select one of the choices from the Run menu. All. Runs all commands in the syntax window. Selection. Runs the currently selected commands. This includes any commands partially highlighted. Current. Runs the command where the cursor is currently located. To End. Runs all commands from the current cursor location to the end of the command syntax file.
PAGE 329

305 Working with Command Syntax Lag Functions One notable exception is transformation commands that contain lag functions. In a series of transformation commands without any intervening EXECUTE commands or other commands that read the data, lag functions are calculated after all other transformations, regardless of command order.
PAGE 330
PAGE 331

Chapter 13 Frequencies The Frequencies procedure provides statistics and graphical displays that are useful for describing many types of variables. For a first look at your data, the Frequencies procedure is a good place to start. For a frequency report and bar chart, you can arrange the distinct values in ascending or descending order or order the categories by their frequencies. The frequencies report can be suppressed when a variable has many distinct values.
PAGE 332

308 Chapter 13 Figure 13-1 Frequencies output Industry Valid Frequency 331 Percent 37.5 Valid Percent 37.5 Cumulative Percent 37.5 Corporate 220 24.9 24.9 62.5 Academic 248 28.1 28.1 90.6 Healthcare 83 9.4 9.4 100.0 882 100.0 100.0 Median $3,417.50 Std. Deviation $1,077.836 Government Total Statistics Mean Amount of Product Sale $3,576.52 To Obtain Frequency Tables E From the menus choose: Analyze Descriptive Statistics Frequencies...
PAGE 333

309 Frequencies Figure 13-2 Frequencies dialog box E Select one or more categorical or quantitative variables. Optionally, you can: Click Statistics for descriptive statistics for quantitative variables. Click Charts for bar charts, pie charts, and histograms. Click Format for the order in which results are displayed.
PAGE 334

310 Chapter 13 Frequencies Statistics Figure 13-3 Frequencies Statistics dialog box Percentile Values. Values of a quantitative variable that divide the ordered data into groups so that a certain percentage is above and another percentage is below. Quartiles (the 25th, 50th, and 75th percentiles) divide the observations into four groups of equal size. If you want an equal number of groups other than four, select Cut points for n equal groups.
PAGE 335

311 Frequencies Dispersion. Statistics that measure the amount of variation or spread in the data include the standard deviation, variance, range, minimum, maximum, and standard error of the mean. Std. deviation. A measure of dispersion around the mean. In a normal distribution, 68% of cases fall within one standard deviation of the mean and 95% of cases fall within two standard deviations.
PAGE 336

312 Chapter 13 Frequencies Charts Figure 13-4 Frequencies Charts dialog box Chart Type. A pie chart displays the contribution of parts to a whole. Each slice of a pie chart corresponds to a group defined by a single grouping variable. A bar chart displays the count for each distinct value or category as a separate bar, allowing you to compare categories visually. A histogram also has bars, but they are plotted along an equal interval scale.
PAGE 337

313 Frequencies Order by. The frequency table can be arranged according to the actual values in the data or according to the count (frequency of occurrence) of those values, and in either ascending or descending order. However, if you request a histogram or percentiles, Frequencies assumes that the variable is quantitative and displays its values in ascending order. Multiple Variables.
PAGE 338
PAGE 339

Chapter 14 Descriptives The Descriptives procedure displays univariate summary statistics for several variables in a single table and calculates standardized values (z scores). Variables can be ordered by the size of their means (in ascending or descending order), alphabetically, or by the order in which you select the variables (the default). When z scores are saved, they are added to the data in the Data Editor and are available for charts, data listings, and analyses.
PAGE 340

316 Chapter 14 To Obtain Descriptive Statistics E From the menus choose: Analyze Descriptive Statistics Descriptives... Figure 14-1 Descriptives dialog box E Select one or more variables. Optionally, you can: Select Save standardized values as variables to save z scores as new variables. Click Options for optional statistics and display order.
PAGE 341

317 Descriptives Descriptives Options Figure 14-2 Descriptives Options dialog box Mean and Sum. The mean, or arithmetic average, is displayed by default. Dispersion. Statistics that measure the spread or variation in the data include the standard deviation, variance, range, minimum, maximum, and standard error of the mean. Standard deviation. A measure of dispersion around the mean.
PAGE 342

318 Chapter 14 Minimum. The smallest value of a numeric variable. Maximum. The largest value of a numeric variable. Standard error of mean. A measure of how much the value of the mean may vary from sample to sample taken from the same distribution. It can be used to roughly compare the observed mean to a hypothesized value (that is, you can conclude the two values are different if the ratio of the difference to the standard error is less than -2 or greater than +2). Distribution.
PAGE 343

Chapter 15 Explore The Explore procedure produces summary statistics and graphical displays, either for all of your cases or separately for groups of cases. There are many reasons for using the Explore procedure—data screening, outlier identification, description, assumption checking, and characterizing differences among subpopulations (groups of cases). Data screening may show that you have unusual values, extreme values, gaps in the data, or other peculiarities.
PAGE 344

320 Chapter 15 may be short string or numeric. The case label variable, used to label outliers in boxplots, can be short string, long string (first 15 characters), or numeric. Assumptions. The distribution of your data does not have to be symmetric or normal.
PAGE 345

321 Explore Extreme Values Time Highest 1 2 3 4 5 Lowest 1 2 3 Case Number 31 33 39 32 36 2 7 1 4 5 11 2 3 1 Time Stem-and-Leaf Plot Frequency 7.00 6.00 3.00 5.00 4.00 3.00 6.00 5.00 1.00 Stem & Leaf 2 . 0133589 3 . 014577 4 . 568 5 . 05779 6 . 1379 7 . 268 8 . 012237 9 . 13589 10 . 5 Stem width: Each leaf: 1.0 1 case(s) To Explore Your Data E From the menus choose: Analyze Descriptive Statistics Explore... Schedule Value 4 10.5 4 9.9 4 9.8 4 9.5 4 9.3 1 2.0 1 2.1 1 2.3 2.3 2.
PAGE 346

322 Chapter 15 Figure 15-2 Explore dialog box E Select one or more dependent variables. Optionally, you can: Select one or more factor variables, whose values will define groups of cases. Select an identification variable to label cases. Click Statistics for robust estimators, outliers, percentiles, and frequency tables. Click Plots for histograms, normal probability plots and tests, and spread-versus-level plots with Levene’s statistics.
PAGE 347

323 Explore Explore Statistics Figure 15-3 Explore Statistics dialog box Descriptives. These measures of central tendency and dispersion are displayed by default. Measures of central tendency indicate the location of the distribution; they include the mean, median, and 5% trimmed mean. Measures of dispersion show the dissimilarity of the values; these include standard error, variance, standard deviation, minimum, maximum, range, and interquartile range.
PAGE 348

324 Chapter 15 Explore Plots Figure 15-4 Explore Plots dialog box Boxplots. These alternatives control the display of boxplots when you have more than one dependent variable. Factor levels together generates a separate display for each dependent variable. Within a display, boxplots are shown for each of the groups defined by a factor variable. Dependents together generates a separate display for each group defined by a factor variable.
PAGE 349

325 Explore Levene’s tests are based on the transformed data. If no factor variable is selected, spread-versus-level plots are not produced. Power estimation produces a plot of the natural logs of the interquartile ranges against the natural logs of the medians for all cells, as well as an estimate of the power transformation for achieving equal variances in the cells. A spread-versus-level plot helps determine the power for a transformation to stabilize (make more equal) variances across groups.
PAGE 350

326 Chapter 15 Exclude cases listwise. Cases with missing values for any dependent or factor variable are excluded from all analyses. This is the default. Exclude cases pairwise. Cases with no missing values for variables in a group (cell) are included in the analysis of that group. The case may have missing values for variables used in other groups. Report values. Missing values for factor variables are treated as a separate category. All output is produced for this additional category.
PAGE 351

Chapter 16 Crosstabs The Crosstabs procedure forms two-way and multiway tables and provides a variety of tests and measures of association for two-way tables. The structure of the table and whether categories are ordered determine what test or measure to use. Crosstabs’ statistics and measures of association are computed for two-way tables only.
PAGE 352

328 Chapter 16 Assumptions. Some statistics and measures assume ordered categories (ordinal data) or quantitative values (interval or ratio data), as discussed in the section on statistics. Others are valid when the table variables have unordered categories (nominal data). For the chi-square-based statistics (phi, Cramér’s V, and contingency coefficient), the data should be a random sample from a multinomial distribution.
PAGE 353

329 Crosstabs Figure 16-2 Crosstabs dialog box E Select one or more row variables and one or more column variables. Optionally, you can: Select one or more control variables. Click Statistics for tests and measures of association for two-way tables or subtables. Click Cells for observed and expected values, percentages, and residuals. Click Format for controlling the order of categories.
PAGE 354

330 Chapter 16 on. If statistics and measures of association are requested, they apply to two-way subtables only. Crosstabs Clustered Bar Charts Display clustered bar charts. A clustered bar chart helps summarize your data for groups of cases. There is one cluster of bars for each value of the variable you specified under Rows. The variable that defines the bars within each cluster is the variable you specified under Columns.
PAGE 355

331 Crosstabs chi-square is computed for all other 2 × 2 tables. For tables with any number of rows and columns, select Chi-square to calculate the Pearson chi-square and the likelihood-ratio chi-square. When both table variables are quantitative, Chi-square yields the linear-by-linear association test. Correlations. For tables in which both rows and columns contain ordered values, Correlations yields Spearman’s correlation coefficient, rho (numeric data only).
PAGE 356

332 Chapter 16 Ordinal. For tables in which both rows and columns contain ordered values, select Gamma (zero-order for 2-way tables and conditional for 3-way to 10-way tables), Kendall’s tau-b, and Kendall’s tau-c. For predicting column categories from row categories, select Somers’ d. Gamma. A symmetric measure of association between two ordinal variables that ranges between -1 and 1. Values close to an absolute value of 1 indicate a strong relationship between the two variables.
PAGE 357

333 Crosstabs Kappa. Cohen's kappa measures the agreement between the evaluations of two raters when both are rating the same object. A value of 1 indicates perfect agreement. A value of 0 indicates that agreement is no better than chance. Kappa is only available for tables in which both variables use the same category values and both variables have the same number of categories. Risk.
PAGE 358

334 Chapter 16 Crosstabs Cell Display Figure 16-4 Crosstabs Cell Display dialog box To help you uncover patterns in the data that contribute to a significant chi-square test, the Crosstabs procedure displays expected frequencies and three types of residuals (deviates) that measure the difference between observed and expected frequencies. Each cell of the table can contain any combination of counts, percentages, and residuals selected. Counts.
PAGE 359

335 Crosstabs Standardized. The residual divided by an estimate of its standard deviation. Standardized residuals, which are also known as Pearson residuals, have a mean of 0 and a standard deviation of 1. Adjusted standardized. The residual for a cell (observed minus expected value) divided by an estimate of its standard error. The resulting standardized residual is expressed in standard deviation units above or below the mean. Noninteger Weights.
PAGE 360
PAGE 361

Chapter 17 Summarize The Summarize procedure calculates subgroup statistics for variables within categories of one or more grouping variables. All levels of the grouping variable are crosstabulated. You can choose the order in which the statistics are displayed. Summary statistics for each variable across all categories are also displayed. Data values in each category can be listed or suppressed. With large data sets, you can choose to list only the first n cases. Example.
PAGE 362

338 Chapter 17 Figure 17-1 Summarize output To Obtain Case Summaries E From the menus choose: Analyze Reports Case Summaries...
PAGE 363

339 Summarize E Select one or more variables. Optionally, you can: Select one or more grouping variables to divide your data into subgroups. Click Options to change the output title, add a caption below the output, or exclude cases with missing values. Click Statistics for optional statistics. Select Display cases to list the cases in each subgroup. By default, the system lists only the first 100 cases in your file.
PAGE 364

340 Chapter 17 Summarize Statistics Figure 17-4 Summarize Cases Statistics dialog box You can choose one or more of the following subgroup statistics for the variables within each category of each grouping variable: sum, number of cases, mean, median, grouped median, standard error of the mean, minimum, maximum, range, variable value of the first category of the grouping variable, variable value of the last category of the grouping variable, standard deviation, variance, kurtosis, standard error of kurtos
PAGE 365

341 Summarize Harmonic Mean. Used to estimate an average group size when the sample sizes in the groups are not equal. The harmonic mean is the total number of samples divided by the sum of the reciprocals of the sample sizes. Kurtosis. A measure of the extent to which observations cluster around a central point. For a normal distribution, the value of the kurtosis statistic is 0.
PAGE 366

342 Chapter 17 are longer than those of a normal distribution; a negative value for kurtosis indicates shorter tails (becoming like those of a box-shaped uniform distribution). Standard Error of Skewness. The ratio of skewness to its standard error can be used as a test of normality (that is, you can reject normality if the ratio is less than -2 or greater than +2). A large positive value for skewness indicates a long right tail; an extreme negative value, a long left tail. Sum.
PAGE 367

Chapter 18 Means The Means procedure calculates subgroup means and related univariate statistics for dependent variables within categories of one or more independent variables. Optionally, you can obtain a one-way analysis of variance, eta, and tests for linearity. Example. Measure the average amount of fat absorbed by three different types of cooking oil and perform a one-way analysis of variance to see if the means differ. Statistics.
PAGE 368

344 Chapter 18 Figure 18-1 Means output Report Absorbed Grams of Fat Type of Oil Peanut Oil Mean N Lard 13.34 Mean 85.00 Std. Deviation Mean N Std. Deviation Total Mean N Std. Deviation To Obtain Subgroup Means E From the menus choose: Analyze Compare Means Means... 6 Std. Deviation N Corn Oil 72.00 6 7.77 62.00 6 8.22 73.00 18 13.
PAGE 369

345 Means Figure 18-2 Means dialog box E Select one or more dependent variables. E There are two ways to select categorical independent variables: Select one or more independent variables. Separate results are displayed for each independent variable. Select one or more layers of independent variables. Each layer further subdivides the sample.
PAGE 370

346 Chapter 18 Means Options Figure 18-3 Means Options dialog box You can choose one or more of the following subgroup statistics for the variables within each category of each grouping variable: sum, number of cases, mean, median, grouped median, standard error of the mean, minimum, maximum, range, variable value of the first category of the grouping variable, variable value of the last category of the grouping variable, standard deviation, variance, kurtosis, standard error of kurtosis, skewness, standa
PAGE 371

347 Means Grouped Median. Median calculated for data that is coded into groups. For example, with age data if each value in the 30's is coded 35, each value in the 40's coded 45, and so on, the grouped median is the median calculated from the coded data. Harmonic Mean. Used to estimate an average group size when the sample sizes in the groups are not equal. The harmonic mean is the total number of samples divided by the sum of the reciprocals of the sample sizes. Kurtosis.
PAGE 372

348 Chapter 18 Standard Error of Kurtosis. The ratio of kurtosis to its standard error can be used as a test of normality (that is, you can reject normality if the ratio is less than -2 or greater than +2). A large positive value for kurtosis indicates that the tails of the distribution are longer than those of a normal distribution; a negative value for kurtosis indicates shorter tails (becoming like those of a box-shaped uniform distribution). Standard Error of Skewness.
PAGE 373

Chapter 19 OLAP Cubes The OLAP (Online Analytical Processing) Cubes procedure calculates totals, means, and other univariate statistics for continuous summary variables within categories of one or more categorical grouping variables. A separate layer in the table is created for each category of each grouping variable. Example. Total and average sales for different regions and product lines within regions. Statistics.
PAGE 374

350 Chapter 19 Figure 19-1 OLAP Cubes output 1996 Sales by Division and Region Division: Total Region: Total Sum Mean Median Std. Deviation $145,038,250 $371,893 $307,500 $171,311 1996 Sales by Division and Region Division: Consumer Products Region: East Sum $18,548,100 Mean $289,814.06 Median $273,600.00 Std. Deviation $80,674.66 To Obtain OLAP Cubes E From the menus choose: Analyze Reports OLAP Cubes...
PAGE 375

351 OLAP Cubes Figure 19-2 OLAP Cubes dialog box E Select one or more continuous summary variables. E Select one or more categorical grouping variables. Optionally, you can: Select different summary statistics (click Statistics). You must select one or more grouping variables before you can select summary statistics. Calculate differences between pairs of variables and pairs of groups defined by a grouping variable (click Differences). Create custom table titles (click Title).
PAGE 376

352 Chapter 19 OLAP Cubes Statistics Figure 19-3 OLAP Cubes Statistics dialog box You can choose one or more of the following subgroup statistics for the summary variables within each category of each grouping variable: sum, number of cases, mean, median, grouped median, standard error of the mean, minimum, maximum, range, variable value of the first category of the grouping variable, variable value of the last category of the grouping variable, standard deviation, variance, kurtosis, standard error of ku
PAGE 377

353 OLAP Cubes Grouped Median. Median calculated for data that is coded into groups. For example, with age data if each value in the 30's is coded 35, each value in the 40's coded 45, and so on, the grouped median is the median calculated from the coded data. Harmonic Mean. Used to estimate an average group size when the sample sizes in the groups are not equal. The harmonic mean is the total number of samples divided by the sum of the reciprocals of the sample sizes. Kurtosis.
PAGE 378

354 Chapter 19 Range. The difference between the largest and smallest values of a numeric variable; the maximum minus the minimum. Skewness. A measure of the asymmetry of a distribution. The normal distribution is symmetric and has a skewness value of zero. A distribution with a significant positive skewness has a long right tail. A distribution with a significant negative skewness has a long left tail.
PAGE 379

355 OLAP Cubes OLAP Cubes Differences Figure 19-4 OLAP Cubes Differences dialog box This dialog box allows you to calculate percentage and arithmetic differences between summary variables or between groups defined by a grouping variable. Differences are calculated for all measures selected in the OLAP Cubes Statistics dialog box. Differences between Variables. Calculates differences between pairs of variables.
PAGE 380

356 Chapter 19 for the Minus category as the denominator. You must select one or more grouping variables in the main dialog box before you can specify differences between groups. OLAP Cubes Title Figure 19-5 OLAP Cubes Title dialog box You can change the title of your output or add a caption that will appear below the output table. You can also control line wrapping of titles and captions by typing \n wherever you want to insert a line break in the text.
PAGE 381

Chapter 20 T Tests Three types of t tests are available: Independent-samples t test (two-sample t test). Compares the means of one variable for two groups of cases. Descriptive statistics for each group and Levene’s test for equality of variances are provided, as well as both equal- and unequal-variance t values and a 95%-confidence interval for the difference in means. Paired-samples t test (dependent t test). Compares the means of two variables for a single group.
PAGE 382

358 Chapter 20 Example. Patients with high blood pressure are randomly assigned to a placebo group and a treatment group. The placebo subjects receive an inactive pill and the treatment subjects receive a new drug that is expected to lower blood pressure. After treating the subjects for two months, the two-sample t test is used to compare the average blood pressures for the placebo group and the treatment group. Each patient is measured once and belongs to one group. Statistics.
PAGE 383

359 T Tests Independent Samples Test Levene's Test for Equality of Variances F Blood pressure Equal variances assumed Significance .134 .719 Equal variances not assumed t-test for Equality of Means t Significance (2-tailed) df Mean Difference Std. Error Difference 95% Confidence Interval of the Mean Lower Upper 3.783 18 .001 26.10 6.90 11.61 40.59 3.783 17.163 .001 26.10 6.90 11.56 40.
PAGE 384

360 Chapter 20 Optionally, you can click Options to control the treatment of missing data and the level of the confidence interval. Independent-Samples T Test Define Groups Figure 20-3 Define Groups dialog box for numeric variables For numeric grouping variables, define the two groups for the t test by specifying two values or a cut point: Use specified values. Enter a value for Group 1 and another for Group 2. Cases with any other values are excluded from the analysis.
PAGE 385

361 T Tests Independent-Samples T Test Options Figure 20-5 Independent-Samples T Test Options dialog box Confidence Interval. By default, a 95%-confidence interval for the difference in means is displayed. Enter a value between 1 and 99 to request a different confidence level. Missing Values. When you test several variables and data are missing for one or more variables, you can tell the procedure which cases to include (or exclude): Exclude missing data analysis by analysis.
PAGE 386

362 Chapter 20 Statistics. For each variable: mean, sample size, standard deviation, and standard error of the mean. For each pair of variables: correlation, average difference in means, t test, and confidence interval for mean difference (you can specify the confidence level). Standard deviation and standard error of the mean difference. Data. For each paired test, specify two quantitative variables (interval- or ratio-level of measurement).
PAGE 387

363 T Tests Figure 20-7 Paired-Samples T Test dialog box E Select a pair of variables, as follows: Click each of two variables. The first variable appears in the Current Selections group as Variable 1, and the second appears as Variable 2. After you have selected a pair of variables, click the arrow button to move the pair into the Paired Variables list. You may select more pairs of variables.
PAGE 388

364 Chapter 20 Confidence Interval. By default, a 95%-confidence interval for the difference in means is displayed. Enter a value between 1 and 99 to request a different confidence level. Missing Values. When you test several variables and data are missing for one or more variables, you can tell the procedure which cases to include (or exclude): Exclude cases analysis by analysis. Each t test uses all cases that have valid data for the pair of variables tested. Sample sizes may vary from test to test.
PAGE 389

365 T Tests Figure 20-9 One-Sample T Test output One-Sample Statistics IQ N 15 Mean 109.33 Std. Deviation 12.03 Std. Error Mean 3.11 Rows and columns have been transposed. One-Sample Test Test Value = 100 t IQ 3.005 df Significance (2-tailed) 14 To Obtain a One-Sample T Test E From the menus choose: Analyze Compare Means One-Sample T Test... Figure 20-10 One-Sample T Test dialog box .009 Mean Difference 9.33 95% Confidence Interval of the Difference Lower 2.67 Upper 15.
PAGE 390

366 Chapter 20 E Select one or more variables to be tested against the same hypothesized value. E Enter a numeric test value against which each sample mean is compared. Optionally, you can click Options to control the treatment of missing data and the level of the confidence interval. One-Sample T Test Options Figure 20-11 One-Sample T Test Options dialog box Confidence Interval. By default, a 95%-confidence interval for the difference between the mean and the hypothesized test value is displayed.
PAGE 391

Chapter One-Way ANOVA 21 The One-Way ANOVA procedure produces a one-way analysis of variance for a quantitative dependent variable by a single factor (independent) variable. Analysis of variance is used to test the hypothesis that several means are equal. This technique is an extension of the two-sample t test. In addition to determining that differences exist among the means, you may want to know which means differ. There are two types of tests for comparing means: a priori contrasts and post hoc tests.
PAGE 392

368 Chapter 21 Assumptions. Each group is an independent random sample from a normal population. Analysis of variance is robust to departures from normality, although the data should be symmetric. The groups should come from populations with equal variances. To test this assumption, use Levene’s homogeneity-of-variance test. Figure 21-1 One-Way ANOVA output ANOVA Sum of Squares Absorbed Grams of Fat Mean Square df F Between Groups 1596.00 2 798.00 Within Groups 1530.00 15 102.00 Total 3126.
PAGE 393

369 One-Way ANOVA Test of Homogeneity of Variances Levene Statistic Absorbed Grams of Fat df1 .534 df2 2 Significance 15 To Obtain a One-Way Analysis of Variance E From the menus choose: Analyze Compare Means One-Way ANOVA... Figure 21-2 One-Way ANOVA dialog box E Select one or more dependent variables. E Select a single independent factor variable. .
PAGE 394

370 Chapter 21 One-Way ANOVA Contrasts Figure 21-3 One-Way ANOVA Contrasts dialog box You can partition the between-groups sums of squares into trend components or specify a priori contrasts. Polynomial. Partitions the between-groups sums of squares into trend components. You can test for a trend of the dependent variable across the ordered levels of the factor variable. For example, you could test for a linear trend (increasing or decreasing) in salary across the ordered levels of highest degree earned.
PAGE 395

371 One-Way ANOVA One-Way ANOVA Post Hoc Tests Figure 21-4 One-Way ANOVA Post Hoc Multiple Comparisons dialog box Once you have determined that differences exist among the means, post hoc range tests and pairwise multiple comparisons can determine which means differ. Range tests identify homogeneous subsets of means that are not different from each other.
PAGE 396

372 Chapter 21 LSD. Uses t tests to perform all pairwise comparisons between group means. No adjustment is made to the error rate for multiple comparisons. Bonferroni. Uses t tests to perform pairwise comparisons between group means, but controls overall error rate by setting the error rate for each test to the experimentwise error rate divided by the total number of tests. Hence, the observed significance level is adjusted for the fact that multiple comparisons are being made. Sidak.
PAGE 397

373 One-Way ANOVA Gabriel. Pairwise comparison test that used the Studentized maximum modulus and is generally more powerful than Hochberg's GT2 when the cell sizes are unequal. Gabriel's test may become liberal when the cell sizes vary greatly. Waller-Duncan. Multiple comparison test based on a t statistic; uses a Bayesian approach. Dunnett. Pairwise multiple comparison t test that compares a set of treatments against a single control mean. The last category is the default control category.
PAGE 398

374 Chapter 21 One-Way ANOVA Options Figure 21-5 One-Way ANOVA Options dialog box Statistics. Choose one or more of the following: Descriptive. Calculates the number of cases, mean, standard deviation, standard error of the mean, minimum, maximum, and 95%-confidence intervals for each dependent variable for each group. Fixed and random effects.
PAGE 399

375 One-Way ANOVA Exclude cases analysis by analysis. A case with a missing value for either the dependent or the factor variable for a given analysis is not used in that analysis. Also, a case outside the range specified for the factor variable is not used. Exclude cases listwise. Cases with missing values for the factor variable or for any dependent variable included on the dependent list in the main dialog box are excluded from all analyses.
PAGE 400
PAGE 401

Chapter GLM Univariate Analysis 22 The GLM Univariate procedure provides regression analysis and analysis of variance for one dependent variable by one or more factors and/or variables. The factor variables divide the population into groups. Using this General Linear Model procedure, you can test null hypotheses about the effects of other variables on the means of various groupings of a single dependent variable.
PAGE 402

378 Chapter 22 might find that gender is a significant effect and that the interaction of gender with weather is significant. Methods. Type I, Type II, Type III, and Type IV sums of squares can be used to evaluate different hypotheses. Type III is the default. Statistics.
PAGE 403

379 GLM Univariate Analysis Figure 22-1 GLM Univariate output Tests of Between-Subjects Effects Dependent Variable: SPVOL Type III Sum of Squares Source Corrected Model df 1 Mean Square F Sig. 22.520 11 2.047 12.376 .000 1016.981 1 1016.981 6147.938 .000 8.691 3 2.897 17.513 .000 10.118 2 5.059 30.583 .000 .997 2 .499 3.014 .082 5.639 4 1.410 8.522 .001 Error 2.316 14 .165 Total 1112.960 26 24.
PAGE 404

380 Chapter 22 Figure 22-2 Univariate dialog box E Select a dependent variable. E Select variables for Fixed Factor(s), Random Factor(s), and Covariate(s), as appropriate for your data. E Optionally, you can use WLS Weight to specify a weight variable for weighted least-squares analysis. If the value of the weighting variable is zero, negative, or missing, the case is excluded from the analysis. A variable already used in the model cannot be used as a weighting variable.
PAGE 405

381 GLM Univariate Analysis GLM Model Figure 22-3 Univariate Model dialog box Specify Model. A full factorial model contains all factor main effects, all covariate main effects, and all factor-by-factor interactions. It does not contain covariate interactions. Select Custom to specify only a subset of interactions or to specify factor-by-covariate interactions. You must indicate all of the terms to be included in the model. Factors and Covariates.
PAGE 406

382 Chapter 22 Build Terms For the selected factors and covariates: Interaction. Creates the highest-level interaction term of all selected variables. This is the default. Main effects. Creates a main-effects term for each variable selected. All 2-way. Creates all possible two-way interactions of the selected variables. All 3-way. Creates all possible three-way interactions of the selected variables. All 4-way. Creates all possible four-way interactions of the selected variables. All 5-way.
PAGE 407

383 GLM Univariate Analysis Any regression model. A purely nested design. (This form of nesting can be specified by using syntax.) Type III. The default. This method calculates the sums of squares of an effect in the design as the sums of squares adjusted for any other effects that do not contain it and orthogonal to any effects (if any) that contain it.
PAGE 408

384 Chapter 22 Contrasts are used to test for differences among the levels of a factor. You can specify a contrast for each factor in the model (in a repeated measures model, for each between-subjects factor). Contrasts represent linear combinations of the parameters. Hypothesis testing is based on the null hypothesis LB = 0, where L is the contrast coefficients matrix and B is the parameter vector.
PAGE 409

385 GLM Univariate Analysis GLM Profile Plots Figure 22-5 Univariate Profile Plots dialog box Profile plots (interaction plots) are useful for comparing marginal means in your model. A profile plot is a line plot in which each point indicates the estimated marginal mean of a dependent variable (adjusted for any covariates) at one level of a factor. The levels of a second factor can be used to make separate lines. Each level in a third factor can be used to create a separate plot.
PAGE 410

386 Chapter 22 Figure 22-6 Nonparallel plot (left) and parallel plot (right) After a plot is specified by selecting factors for the horizontal axis and, optionally, factors for separate lines and separate plots, the plot must be added to the Plots list. GLM Post Hoc Comparisons Figure 22-7 Univariate Post Hoc Multiple Comparisons for Observed Means dialog box Post hoc multiple comparison tests.
PAGE 411

387 GLM Univariate Analysis multiple comparison tests are performed for the average across the levels of the within-subjects factors. For GLM Multivariate, the post hoc tests are performed for each dependent variable separately. GLM Multivariate and GLM Repeated Measures are available only if you have the Advanced Models option installed. The Bonferroni and Tukey’s honestly significant difference tests are commonly used multiple comparison tests.
PAGE 412

388 Chapter 22 test (sometimes liberal), or Dunnett’s C (pairwise comparison test based on the Studentized range). Duncan’s multiple range test, Student-Newman-Keuls (S-N-K), and Tukey’s b are range tests that rank group means and compute a range value. These tests are not used as frequently as the tests previously discussed. The Waller-Duncan t test uses a Bayesian approach. This range test uses the harmonic mean of the sample size when the sample sizes are unequal.
PAGE 413

389 GLM Univariate Analysis GLM Save Figure 22-8 Univariate Save dialog box You can save values predicted by the model, residuals, and related measures as new variables in the Data Editor. Many of these variables can be used for examining assumptions about the data. To save the values for use in another SPSS session, you must save the current data file. Predicted Values. The values that the model predicts for each case. Unstandardized. The value the model predicts for the dependent variable.
PAGE 414

390 Chapter 22 Cook's distance. A measure of how much the residuals of all cases would change if a particular case were excluded from the calculation of the regression coefficients. A large Cook's D indicates that excluding a case from computation of the regression statistics changes the coefficients substantially. Leverage values. Uncentered leverage values. The relative influence of each observation on the model's fit. Residuals.
PAGE 415

391 GLM Univariate Analysis GLM Options Figure 22-9 Univariate Options dialog box Optional statistics are available from this dialog box. Statistics are calculated using a fixed-effects model. Estimated Marginal Means. Select the factors and interactions for which you want estimates of the population marginal means in the cells. These means are adjusted for the covariates, if any. Compare main effects.
PAGE 416

392 Chapter 22 Display. Select Descriptive statistics to produce observed means, standard deviations, and counts for all of the dependent variables in all cells. Estimates of effect size gives a partial eta-squared value for each effect and each parameter estimate. The eta-squared statistic describes the proportion of total variability attributable to a factor. Select Observed power to obtain the power of the test when the alternative hypothesis is set based on the observed value.
PAGE 417

393 GLM Univariate Analysis Construct a custom L matrix, M matrix, or K matrix (using the LMATRIX, MMATRIX, and KMATRIX subcommands). For deviation or simple contrasts, specify an intermediate reference category (using the CONTRAST subcommand). Specify metrics for polynomial contrasts (using the CONTRAST subcommand). Specify error terms for post hoc comparisons (using the POSTHOC subcommand).
PAGE 418
PAGE 419

Chapter Bivariate Correlations 23 The Bivariate Correlations procedure computes Pearson’s correlation coefficient, Spearman’s rho, and Kendall’s tau-b with their significance levels. Correlations measure how variables or rank orders are related. Before calculating a correlation coefficient, screen your data for outliers (which can cause misleading results) and evidence of a linear relationship. Pearson’s correlation coefficient is a measure of linear association.
PAGE 420

396 Chapter 23 Figure 23-1 Bivariate Correlations output Correlations Number of Games Won Pearson Correlation Number of Games Won Scoring Points Per Game Defense Points Per Game Significance (2-tailed) N 1.000 .581** -.401* Defense Points Per Game .581** 1.000 -.401* .457* .457* 1.000 Number of Games Won . .001 .038 Scoring Points Per Game .001 . .017 Defense Points Per Game .038 .017 .
PAGE 421

397 Bivariate Correlations Figure 23-2 Bivariate Correlations dialog box E Select two or more numeric variables. The following options are also available: Correlation Coefficients. For quantitative, normally distributed variables, choose the Pearson correlation coefficient. If your data are not normally distributed or have ordered categories, choose Kendall’s tau-b or Spearman, which measure the association between rank orders.
PAGE 422

398 Chapter 23 Bivariate Correlations Options Figure 23-3 Bivariate Correlations Options dialog box Statistics. For Pearson correlations, you can choose one or both of the following: Means and standard deviations. Displayed for each variable. The number of cases with nonmissing values is also shown. Missing values are handled on a variable-by-variable basis regardless of your missing values setting. Cross-product deviations and covariances. Displayed for each pair of variables.
PAGE 423

399 Bivariate Correlations CORRELATIONS and NONPAR CORR Command Additional Features The SPSS command language also allows you to: Write a correlation matrix for Pearson correlations that can be used in place of raw data to obtain other analyses such as factor analysis (with the MATRIX subcommand). Obtain correlations of each variable on a list with each variable on a second list (using the keyword WITH on the VARIABLES subcommand).
PAGE 424
PAGE 425

Chapter Partial Correlations 24 The Partial Correlations procedure computes partial correlation coefficients that describe the linear relationship between two variables while controlling for the effects of one or more additional variables. Correlations are measures of linear association. Two variables can be perfectly related, but if the relationship is not linear, a correlation coefficient is not an appropriate statistic for measuring their association. Example.
PAGE 426

402 Chapter 24 Figure 24-1 Partial Correlations output Correlations Control Variables -none- 1 Health care funding (amount per 100) Reported diseases (rate per 10,000) Health care funding (amount per 100) 1.000 Reported diseases (rate per 10,000) .737 Visits to health care providers (rate per 10,000) .964 Significance (2-tailed) . .000 .000 df 0 48 48 Correlation .737 1.000 .762 Significance (2-tailed) .000 . .000 48 0 48 Correlation .964 .762 1.000 Significance (2-tailed) .
PAGE 427

403 Partial Correlations Figure 24-2 Partial Correlations dialog box E Select two or more numeric variables for which partial correlations are to be computed. E Select one or more numeric control variables. The following options are also available: Test of Significance. You can select two-tailed or one-tailed probabilities. If the direction of association is known in advance, select One-tailed. Otherwise, select Two-tailed. Display actual significance level.
PAGE 428

404 Chapter 24 Partial Correlations Options Figure 24-3 Partial Correlations Options dialog box Statistics. You can choose one or both of the following: Means and standard deviations. Displayed for each variable. The number of cases with nonmissing values is also shown. Zero-order correlations. A matrix of simple correlations between all variables, including control variables, is displayed. Missing Values. You can choose one of the following alternatives: Exclude cases listwise.
PAGE 429

Chapter 25 Distances This procedure calculates any of a wide variety of statistics measuring either similarities or dissimilarities (distances), either between pairs of variables or between pairs of cases. These similarity or distance measures can then be used with other procedures, such as factor analysis, cluster analysis, or multidimensional scaling, to help analyze complex data sets. Example.
PAGE 430

406 Chapter 25 Figure 25-1 Distances dialog box E Select at least one numeric variable to compute distances between cases, or select at least two numeric variables to compute distances between variables. E Select an alternative in the Compute Distances group to calculate proximities either between cases or between variables.
PAGE 431

407 Distances Distances Dissimilarity Measures Figure 25-2 Distances Dissimilarity Measures dialog box From the Measure group, select the alternative that corresponds to your type of data (interval, count, or binary); then, from the drop-down list, select one of the measures that corresponds to that type of data. Available measures, by data type, are: Interval data. Euclidean distance, squared Euclidean distance, Chebychev, block, Minkowski, or customized. Count data.
PAGE 432

408 Chapter 25 Distances Similarity Measures Figure 25-3 Distances Similarity Measures dialog box From the Measure group, select the alternative that corresponds to your type of data (interval or binary); then, from the drop-down list, select one of the measures that corresponds to that type of data. Available measures, by data type, are: Interval data. Pearson correlation or cosine. Binary data.
PAGE 433

Chapter Linear Regression 26 Linear Regression estimates the coefficients of the linear equation, involving one or more independent variables, that best predict the value of the dependent variable. For example, you can try to predict a salesperson’s total yearly sales (the dependent variable) from independent variables such as age, education, and years of experience. Example.
PAGE 434

410 Chapter 26 between the dependent variable and each independent variable should be linear, and all observations should be independent.
PAGE 435

411 Linear Regression Model Summary 3,4 Variables Entered Model 1 Defense Points Per Game, Scoring Points Per 1,2 Game Removed R . R Square .947 Std. Error of the Estimate Adjusted R Square .898 .889 4.40 1. Indep. vars: (constant) Defense Points Per Game, Scoring Points Per Game... 2. All requested variables entered. 3. Dependent Variable: Number of Games Won 4. Method: Enter ANOVA 2 Sum of Squares Model 1 Regression Residual Total Mean Square df 4080.533 2 2040.266 465.
PAGE 436

412 Chapter 26 2.0 1.5 Standardized Residual 1.0 .5 0.0 -.5 -1.0 -1.5 -2.0 10 20 30 40 50 Number of Games Won To Obtain a Linear Regression Analysis E From the menus choose: Analyze Regression Linear...
PAGE 437

413 Linear Regression Figure 26-2 Linear Regression dialog box E In the Linear Regression dialog box, select a numeric dependent variable. E Select one or more numeric independent variables. Optionally, you can: Group independent variables into blocks and specify different entry methods for different subsets of variables. Choose a selection variable to limit the analysis to a subset of cases having a particular value(s) for this variable.
PAGE 438

414 Chapter 26 the value of the weighting variable is zero, negative, or missing, the case is excluded from the analysis. Linear Regression Variable Selection Methods Method selection allows you to specify how independent variables are entered into the analysis. Using different methods, you can construct a variety of regression models from the same set of variables. Enter (Regression). A procedure for variable selection in which all variables in a block are entered in a single step. Stepwise.
PAGE 439

415 Linear Regression All variables must pass the tolerance criterion to be entered in the equation, regardless of the entry method specified. The default tolerance level is 0.0001. Also, a variable is not entered if it would cause the tolerance of another variable already in the model to drop below the tolerance criterion. All independent variables selected are added to a single regression model. However, you can specify different entry methods for different subsets of variables.
PAGE 440

416 Chapter 26 Linear Regression Plots Figure 26-4 Linear Regression Plots dialog box Plots can aid in the validation of the assumptions of normality, linearity, and equality of variances. Plots are also useful for detecting outliers, unusual observations, and influential cases. After saving them as new variables, predicted values, residuals, and other diagnostics are available in the Data Editor for constructing plots with the independent variables. The following plots are available: Scatterplots.
PAGE 441

417 Linear Regression Standardized Residual Plots. You can obtain histograms of standardized residuals and normal probability plots comparing the distribution of standardized residuals to a normal distribution. If any plots are requested, summary statistics are displayed for standardized predicted values and standardized residuals (*ZPRED and *ZRESID).
PAGE 442

418 Chapter 26 Standardized. A transformation of each predicted value into its standardized form. That is, the mean predicted value is subtracted from the predicted value, and the difference is divided by the standard deviation of the predicted values. Standardized predicted values have a mean of 0 and a standard deviation of 1. Adjusted. The predicted value for a case when that case is excluded from the calculation of the regression coefficients. S.E. of mean predictions.
PAGE 443

419 Linear Regression Standardized. The residual divided by an estimate of its standard deviation. Standardized residuals, which are also known as Pearson residuals, have a mean of 0 and a standard deviation of 1. Studentized. The residual divided by an estimate of its standard deviation that varies from case to case, depending on the distance of each case's values on the independent variables from the means of the independent variables. Deleted.
PAGE 444

420 Chapter 26 square root of p/N, where p is the number of parameters in the model and N is the number of cases. Covariance ratio. The ratio of the determinant of the covariance matrix with a particular case excluded from the calculation of the regression coefficients to the determinant of the covariance matrix with all cases included. If the ratio is close to 1, the case does not significantly alter the covariance matrix. Save to New File. Saves regression coefficients to a file that you specify.
PAGE 445

421 Linear Regression matrix of regression coefficients with covariances off the diagonal and variances on the diagonal. A correlation matrix is also displayed. Model fit. The variables entered and removed from the model are listed, and the following goodness-of-fit statistics are displayed: multiple R, R2 and adjusted R2, standard error of the estimate, and an analysis-of-variance table. R squared change. The change in the R2 statistic that is produced by adding or deleting an independent variable.
PAGE 446

422 Chapter 26 Linear Regression Options Figure 26-7 Linear Regression Options dialog box The following options are available: Stepping Method Criteria. These options apply when either the forward, backward, or stepwise variable selection method has been specified. Variables can be entered or removed from the model depending on either the significance (probability) of the F value or the F value itself. Use Probability of F.
PAGE 447

423 Linear Regression of regression that do include a constant. For example, R2 cannot be interpreted in the usual way. Missing Values. You can choose one of the following: Exclude cases listwise. Only cases with valid values for all variables are included in the analyses. Exclude cases pairwise. Cases with complete data for the pair of variables being correlated are used to compute the correlation coefficient on which the regression analysis is based.
PAGE 448
PAGE 449

Chapter 27 Curve Estimation The Curve Estimation procedure produces curve estimation regression statistics and related plots for 11 different curve estimation regression models. A separate model is produced for each dependent variable. You can also save predicted values, residuals, and prediction intervals as new variables. Example. An internet service provider tracks the percentage of virus-infected e-mail traffic on its networks over time. A scatterplot reveals that the relationship is nonlinear.
PAGE 450

426 Chapter 27 Figure 27-1 Curve Estimation summary table Figure 27-2 Curve Estimation ANOVA Figure 27-3 Curve Estimation coefficients
PAGE 451

427 Curve Estimation Figure 27-4 Curve Estimation chart To Obtain a Curve Estimation E From the menus choose: Analyze Regression Curve Estimation...
PAGE 452

428 Chapter 27 Figure 27-5 Curve Estimation dialog box E Select one or more dependent variables. A separate model is produced for each dependent variable. E Select an independent variable (either a variable in the working data file or Time). Optionally, you can: Select a variable for labeling cases in scatterplots. For each point in the scatterplot, you can use the Point Selection tool to display the value of the Case Label variable.
PAGE 453

429 Curve Estimation Plot models. Plots the values of the dependent variable and each selected model against the independent variable. A separate chart is produced for each dependent variable. Display ANOVA table. Displays a summary analysis-of-variance table for each selected model. Curve Estimation Models You can choose one or more curve estimation regression models. To determine which model to use, plot your data.
PAGE 454

430 Chapter 27 Curve Estimation Save Figure 27-6 Curve Estimation Save dialog box Save Variables. For each selected model, you can save predicted values, residuals (observed value of the dependent variable minus the model predicted value), and prediction intervals (upper and lower bounds). The new variable names and descriptive labels are displayed in a table in the output window. Predict Cases.
PAGE 455

Chapter Discriminant Analysis 28 Discriminant analysis is useful for situations where you want to build a predictive model of group membership based on observed characteristics of each case. The procedure generates a discriminant function (or, for more than two groups, a set of discriminant functions) based on linear combinations of the predictor variables that provide the best discrimination between the groups.
PAGE 456

432 Chapter 28 If these variables are useful for discriminating between the two climate zones, the values of D will differ for the temperate and tropic countries. If you use a stepwise variable selection method, you may find that you do not need to include all four variables in the function. Statistics. For each variable: means, standard deviations, univariate ANOVA.
PAGE 457

433 Discriminant Analysis Structure Matrix Function 1 CALORIES .986 LOG_GDP .790 URBAN .488 LOG_POP .082 Functions at Group Centroids Function CLIMATE 1 tropical -.869 temperate 1.107 To Obtain a Discriminant Analysis E From the menus choose: Analyze Classify Discriminant...
PAGE 458

434 Chapter 28 E Select an integer-valued grouping variable and click Define Range to specify the categories of interest. E Select the independent, or predictor, variables. (If your grouping variable does not have integer values, Automatic Recode on the Transform menu will create one that does.) E Select the method for entering the independent variables. Enter independents together. Forced-entry method. All independent variables that satisfy tolerance criteria are entered simultaneously.
PAGE 459

435 Discriminant Analysis Discriminant Analysis Select Cases Figure 28-4 Discriminant Analysis Set Value dialog box To select cases for your analysis, in the main dialog box click Select, choose a selection variable, and click Value to enter an integer as the selection value. Only cases with that value for the selection variable are used to derive the discriminant functions. Statistics and classification results are generated for both selected and unselected cases.
PAGE 460

436 Chapter 28 Univariate ANOVAs. Performs a one-way analysis of variance test for equality of group means for each independent variable. Box's M. A test for the equality of the group covariance matrices. For sufficiently large samples, a nonsignificant p value means there is insufficient evidence that the matrices differ. The test is sensitive to departures from multivariate normality. Function Coefficients. Available options are Fisher’s classification coefficients and unstandardized coefficients.
PAGE 461

437 Discriminant Analysis Discriminant Analysis Stepwise Method Figure 28-6 Discriminant Analysis Stepwise Method dialog box Method. Select the statistic to be used for entering or removing new variables. Available alternatives are Wilks’ lambda, unexplained variance, Mahalanobis’ distance, smallest F ratio, and Rao’s V. With Rao’s V, you can specify the minimum increase in V for a variable to enter. Wilks' lambda.
PAGE 462

438 Chapter 28 Criteria. Available alternatives are Use F value and Use probability of F. Enter values for entering and removing variables. Use F value. A variable is entered into the model if its F value is greater than the Entry value and is removed if the F value is less than the Removal value. Entry must be greater than Removal, and both values must be positive. To enter more variables into the model, lower the Entry value. To remove more variables from the model, increase the Removal value.
PAGE 463

439 Discriminant Analysis Casewise results. Codes for actual group, predicted group, posterior probabilities, and discriminant scores are displayed for each case. Summary table. The number of cases correctly and incorrectly assigned to each of the groups based on the discriminant analysis. Sometimes called the "Confusion Matrix." Leave-one-out classification. Each case in the analysis is classified by the functions derived from all cases other than that case. It is also known as the "U-method.
PAGE 464

440 Chapter 28 Discriminant Analysis Save Figure 28-8 Discriminant Analysis Save dialog box You can add new variables to your active data file. Available options are predicted group membership (a single variable), discriminant scores (one variable for each discriminant function in the solution), and probabilities of group membership given the discriminant scores (one variable for each group). You can also export model information to the specified file in XML (PMML) format.
PAGE 465

Chapter 29 Factor Analysis Factor analysis attempts to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables. Factor analysis is often used in data reduction to identify a small number of factors that explain most of the variance observed in a much larger number of manifest variables.
PAGE 466

442 Chapter 29 Kaiser-Meyer-Olkin measure of sampling adequacy and Bartlett’s test of sphericity; unrotated solution, including factor loadings, communalities, and eigenvalues; rotated solution, including rotated pattern matrix and transformation matrix; for oblique rotations: rotated pattern and structure matrices; factor score coefficient matrix and factor covariance matrix. Plots: scree plot of eigenvalues and loading plot of first two or three factors. Data.
PAGE 467

443 Factor Analysis Communalities Initial 1.000 Extraction .953 BABYMORT 1.000 .949 LITERACY 1.000 .825 BIRTH_RT 1.000 .943 FERTILTY 1.000 .875 URBAN 1.000 .604 LOG_GDP 1.000 .738 POP_INCR 1.000 .945 B_TO_D 1.000 .925 DEATH_RT 1.000 .689 LOG_POP 1.000 .292 LIFEEXPF Extraction Method: Principal Component Analysis. Total Variance Explained Extraction Sums of Squared Loadings Initial Eigenvalues Component Rotation Sums of Squared Loadings 1 Total 6.242 % of Variance 56.
PAGE 468

444 Chapter 29 Rotated Component Matrix Component BIRTH_RT 1 .969 2 FERTILTY .931 LITERACY -.880 LIFEEXPF -.856 .469 .853 -.469 POP_INCR .847 .476 LOG_GDP -.794 .327 URBAN -.561 BABYMORT DEATH_RT B_TO_D .226 .539 -.827 .614 LOG_POP .741 -.520 Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. Component Transformation Matrix Component 1 1 .982 2 -.190 2 .190 .982 Extraction Method: Principal Component Analysis.
PAGE 469

445 Factor Analysis Component Plot in Rotated Space 1.0 birth to death ra tio people living in cit average female life .5 population increase log (b ase 10) of gd p people who read (%) fertility: average n birth rate per 100 Component 2 0.0 infant mortality (de log (b ase 10) of pop -.5 death rate pe r 1000 -1.0 -1.0 -.5 0.0 .5 Component 1 To Obtain a Factor Analysis E From the menus choose: Analyze Data Reduction Factor... E Select the variables for the factor analysis. 1.
PAGE 470

446 Chapter 29 Figure 29-2 Factor Analysis dialog box Factor Analysis Select Cases Figure 29-3 Factor Analysis Set Value dialog box To select cases for your analysis, choose a selection variable, and click Value to enter an integer as the selection value. Only cases with that value for the selection variable are used in the factor analysis.
PAGE 471

447 Factor Analysis Factor Analysis Descriptives Figure 29-4 Factor Analysis Descriptives dialog box Statistics. Univariate statistics include the mean, standard deviation, and number of valid cases for each variable. Initial solution displays initial communalities, eigenvalues, and the percentage of variance explained. Correlation Matrix. The available options are coefficients, significance levels, determinant, KMO and Bartlett’s test of sphericity, inverse, reproduced, and anti-image.
PAGE 472

448 Chapter 29 Factor Analysis Extraction Figure 29-5 Factor Analysis Extraction dialog box Method. Allows you to specify the method of factor extraction. Available methods are principal components, unweighted least squares, generalized least squares, maximum likelihood, principal axis factoring, alpha factoring, and image factoring. Principal Components Analysis. A factor extraction method used to form uncorrelated linear combinations of the observed variables.
PAGE 473

449 Factor Analysis Principal Axis Factoring. A method of extracting factors from the original correlation matrix with squared multiple correlation coefficients placed in the diagonal as initial estimates of the communalities. These factor loadings are used to estimate new communalities that replace the old communality estimates in the diagonal. Iterations continue until the changes in the communalities from one iteration to the next satisfy the convergence criterion for extraction. Alpha.
PAGE 474

450 Chapter 29 Factor Analysis Rotation Figure 29-6 Factor Analysis Rotation dialog box Method. Allows you to select the method of factor rotation. Available methods are varimax, direct oblimin, quartimax, equamax, or promax. Varimax Method. An orthogonal rotation method that minimizes the number of variables that have high loadings on each factor. It simplifies the interpretation of the factors. Direct Oblimin Method. A method for oblique (nonorthogonal) rotation.
PAGE 475

451 Factor Analysis Rotated Solution. A rotation method must be selected to obtain a rotated solution. For orthogonal rotations, the rotated pattern matrix and factor transformation matrix are displayed. For oblique rotations, the pattern, structure, and factor correlation matrices are displayed. Factor Loading Plot. Three-dimensional factor loading plot of the first three factors. For a two-factor solution, a two-dimensional plot is shown. The plot is not displayed if only one factor is extracted.
PAGE 476

452 Chapter 29 Bartlett Scores. A method of estimating factor score coefficients. The scores produced have a mean of 0. The sum of squares of the unique factors over the range of variables is minimized. Anderson-Rubin Method. A method of estimating factor score coefficients; a modification of the Bartlett method which ensures orthogonality of the estimated factors. The scores produced have a mean of 0, a standard deviation of 1, and are uncorrelated. Display factor score coefficient matrix.
PAGE 477

Chapter 30 Choosing a Procedure for Clustering Cluster analyses can be performed using the TwoStep, Hierarchical, or K-Means Cluster Analysis procedures. Each procedure employs a different algorithm for creating clusters, and each has options not available in the others. TwoStep Cluster Analysis. For many applications, the TwoStep Cluster Analysis procedure will be the method of choice.
PAGE 478

454 Chapter 30 K-Means Cluster Analysis. The K-Means Cluster Analysis procedure is limited to continuous data and requires you to specify the number of clusters in advance, but it has the following unique features: Ability to save distances from cluster centers for each object. Ability to read initial cluster centers from and save final cluster centers to an external SPSS file. Additionally, the K-Means Cluster Analysis procedure can analyze large data files.
PAGE 479

Chapter TwoStep Cluster Analysis 31 The TwoStep Cluster Analysis procedure is an exploratory tool designed to reveal natural groupings (or clusters) within a data set that would otherwise not be apparent. The algorithm employed by this procedure has several desirable features that differentiate it from traditional clustering techniques: Handling of categorical and continuous variables.
PAGE 480

456 Chapter 31 Figure 31-1 TwoStep Cluster Analysis dialog box Distance Measure. This selection determines how the similarity between two clusters is computed. Log-likelihood. The likelihood measure places a probability distribution on the variables. Continuous variables are assumed to be normally distributed, while categorical variables are assumed to be multinomial. All variables are assumed to be independent. Euclidean. The Euclidean measure is the “straight line” distance between two clusters.
PAGE 481

457 TwoStep Cluster Analysis Determine automatically. The procedure will automatically determine the “best” number of clusters, using the criterion specified in the Clustering Criterion group. Optionally, enter a positive integer specifying the maximum number of clusters that the procedure should consider. Specify fixed. Allows you to fix the number of clusters in the solution. Enter a positive integer. Count of Continuous Variables.
PAGE 482

458 Chapter 31 To Obtain a TwoStep Cluster Analysis E From the menus choose: Analyze Classify TwoStep Cluster... E Select one or more categorical or continuous variables. Optionally, you can: Adjust the criteria by which clusters are constructed. Select settings for noise handling, memory allocation, variable standardization, and cluster model input. Request optional tables and plots. Save model results to the working file or to an external XML file.
PAGE 483

459 TwoStep Cluster Analysis TwoStep Cluster Analysis Options Figure 31-2 TwoStep Cluster Analysis Options dialog box Outlier Treatment. This group allows you to treat outliers specially during clustering if the cluster features (CF) tree fills. The CF tree is full if it cannot accept any more cases in a leaf node, and no leaf node can be split. If you select noise handling and the CF tree fills, it will be regrown after placing cases in sparse leaves into a “noise” leaf.
PAGE 484

460 Chapter 31 Memory Allocation. This group allows you to specify the maximum amount of memory in megabytes (MB) that the cluster algorithm should use. If the procedure exceeds this maximum, it will use the disk to store information that will not fit in memory. Specify a number greater than or equal to 4. Consult your system administrator for the largest value that you can specify on your system. The algorithm may fail to find the correct or desired number of clusters if this value is too low.
PAGE 485

461 TwoStep Cluster Analysis variable names in the main dialog box in the same order in which they were specified in the prior analysis. The XML file remains unaltered, unless you specifically write the new model information to the same filename. For more information, see “TwoStep Cluster Analysis Output” on p. 463. If a cluster model update is specified, the options pertaining to generation of the CF tree that were specified for the original model are used.
PAGE 486

462 Chapter 31 TwoStep Cluster Analysis Plots Figure 31-3 TwoStep Cluster Analysis Plots dialog box Within cluster percentage chart. Displays charts showing the within-cluster variation of each variable. For each categorical variable, a clustered bar chart is produced, showing the category frequency by cluster ID. For each continuous variable, an error bar chart is produced, showing error bars by cluster ID. Cluster pie chart.
PAGE 487

463 TwoStep Cluster Analysis for the test of equality of means for a continuous variable and the expected frequency with the overall data set for a categorical variable. Confidence level. This option allows you to set the confidence level for the test of equality of a variable’s distribution within a cluster versus the variable’s overall distribution. Specify a number less than 100 and greater than or equal to 50.
PAGE 488

464 Chapter 31 Descriptives by cluster. Displays two tables that describe the variables in each cluster. In one table, means and standard deviations are reported for continuous variables by cluster. The other table reports frequencies of categorical variables by cluster. Cluster frequencies. Displays a table that reports the number of observations in each cluster. Information criterion (AIC or BIC).
PAGE 489

Chapter Hierarchical Cluster Analysis 32 This procedure attempts to identify relatively homogeneous groups of cases (or variables) based on selected characteristics, using an algorithm that starts with each case (or variable) in a separate cluster and combines clusters until only one is left. You can analyze raw variables or you can choose from a variety of standardizing transformations. Distance or similarity measures are generated by the Proximities procedure.
PAGE 490

466 Chapter 32 Assumptions. The distance or similarity measures used should be appropriate for the data analyzed (see the Proximities procedure for more information on choices of distance and similarity measures). Also, you should include all relevant variables in your analysis. Omission of influential variables can result in a misleading solution. Because hierarchical cluster analysis is an exploratory method, results should be treated as tentative until they are confirmed with an independent sample.
PAGE 491

467 Hierarchical Cluster Analysis Vertical Icicle 14:India 14 13:Banglades 13 10:Japan 10 9:Italy 9 7:Canada 7 8:Denmark 8 12:Switzerlan 12 11:Norway 11 6:Austria 6 16:Paraguay 16 15:Bolivia 15 5:Indonesia 5 2:Brazil 2 4:Domincan R 4 3:Chile 3 1:Argentina Case Number of clusters 1 XX XX XX XX XX XX XXX XX XX XX XX XX XX XXX X 2 XX X XX XX XX XX XXX XX XX XX XX XX XX XXX X 3 XX X XX XX XX XX XXX XX X XX XX XX XX XXX X 4 5 XX X XX X XX XX XX XX XXX XX XX XX XX XX XXX XX X XX X X X XX XX XXX X X
PAGE 492

468 Chapter 32 Figure 32-2 Hierarchical Cluster Analysis dialog box E If you are clustering cases, select at least one numeric variable. If you are clustering variables, select at least three numeric variables. Optionally, you can select an identification variable to label cases.
PAGE 493

469 Hierarchical Cluster Analysis Hierarchical Cluster Analysis Method Figure 32-3 Hierarchical Cluster Analysis Method dialog box Cluster Method. Available alternatives are between-groups linkage, within-groups linkage, nearest neighbor, furthest neighbor, centroid clustering, median clustering, and Ward’s method. Measure. Allows you to specify the distance or similarity measure to be used in clustering. Select the type of data and the appropriate distance or similarity measure: Interval data.
PAGE 494

470 Chapter 32 Transform Values. Allows you to standardize data values for either cases or values before computing proximities (not available for binary data). Available standardization methods are z scores, range –1 to 1, range 0 to 1, maximum magnitude of 1, mean of 1, and standard deviation of 1. Transform Measures. Allows you to transform the values generated by the distance measure. They are applied after the distance measure has been computed.
PAGE 495

471 Hierarchical Cluster Analysis Hierarchical Cluster Analysis Plots Figure 32-5 Hierarchical Cluster Analysis Plots dialog box Dendrogram. Displays a dendrogram. Dendrograms can be used to assess the cohesiveness of the clusters formed and can provide information about the appropriate number of clusters to keep. Icicle. Displays an icicle plot, including all clusters or a specified range of clusters.
PAGE 496
PAGE 497

Chapter K-Means Cluster Analysis 33 This procedure attempts to identify relatively homogeneous groups of cases based on selected characteristics, using an algorithm that can handle large numbers of cases. However, the algorithm requires you to specify the number of clusters. You can specify initial cluster centers if you know this information. You can select one of two methods for classifying cases, either updating cluster centers iteratively or classifying only.
PAGE 498

474 Chapter 33 initial cluster centers and not using the Use running means option will avoid issues related to case order. However, ordering of the initial cluster centers may affect the solution, if there are tied distances from cases to cluster centers. Comparing results from analyses with different permutations of the initial center values may be used to assess the stability of a given solution. Assumptions. Distances are computed using simple Euclidean distance.
PAGE 499

475 K-Means Cluster Analysis Iteration History Change in Cluster Centers Iteration 1 2 1 1.932 2.724 3.343 3 4 1.596 2 .000 .471 .466 .314 3 .861 .414 .172 .195 4 .604 .337 .000 .150 5 .000 .253 .237 .167 6 .000 .199 .287 .071 7 .623 .160 .000 .000 8 .000 .084 .000 .074 9 .000 .080 .000 .077 10 .000 .097 .185 .000 Final Cluster Centers Cluster 1 2 ZURBAN -1.70745 -.30863 .16816 .62767 ZLIFEEXP -2.52826 -.15939 -.28417 .80611 ZLITERAC -2.
PAGE 500

476 Chapter 33 ANOVA Cluster Mean Square ZURBAN 10.409 ZLIFEEXP ZLITERAC Error df Mean Square df 3 .541 68 19.234 .000 19.410 3 .210 68 92.614 .000 18.731 3 .229 68 81.655 .000 ZPOP_INC 18.464 3 .219 68 84.428 .000 ZBABYMOR 18.621 3 .239 68 77.859 .000 ZBIRTH_R 19.599 3 .167 68 117.339 .000 ZDEATH_R 13.628 3 .444 68 30.676 .000 ZLOG_GDP 17.599 3 .287 68 61.313 .000 ZB_TO_D 16.316 3 .288 68 56.682 .000 ZFERTILT 18.829 3 .168 68 112.273 .
PAGE 501

477 K-Means Cluster Analysis Figure 33-2 K-Means Cluster Analysis dialog box E Select the variables to be used in the cluster analysis. E Specify the number of clusters. The number of clusters must be at least two and must not be greater than the number of cases in the data file. E Select either Iterate and classify or Classify only. Optionally, you can select an identification variable to label cases.
PAGE 502

478 Chapter 33 K-Means Cluster Analysis Iterate Figure 33-3 K-Means Cluster Analysis Iterate dialog box These options are available only if you select the Iterate and Classify method from the main dialog box. Maximum Iterations. Limits the number of iterations in the k-means algorithm. Iteration stops after this many iterations even if the convergence criterion is not satisfied. This number must be between 1 and 999. To reproduce the algorithm used by the Quick Cluster command prior to version 5.
PAGE 503

479 K-Means Cluster Analysis You can save information about the solution as new variables to be used in subsequent analyses: Cluster membership. Creates a new variable indicating the final cluster membership of each case. Values of the new variable range from 1 to the number of clusters. Distance from cluster center. Creates a new variable indicating the Euclidean distance between each case and its classification center.
PAGE 504

480 Chapter 33 Missing Values. Available options are Exclude cases listwise or Exclude cases pairwise. Exclude cases listwise. Excludes cases with missing values for any clustering variable from the analysis. Exclude cases pairwise. Assigns cases to clusters based on distances computed from all variables with nonmissing values.
PAGE 505

Chapter Nonparametric Tests 34 The Nonparametric Tests procedure provides several tests that do not require assumptions about the shape of the underlying distribution: Chi-Square Test. Tabulates a variable into categories and computes a chi-square statistic based on the differences between observed and expected frequencies. Binomial Test. Compares the observed frequency in each category of a dichotomous variable with expected frequencies from the binomial distribution. Runs Test.
PAGE 506

482 Chapter 34 Chi-Square Test The Chi-Square Test procedure tabulates a variable into categories and computes a chi-square statistic. This goodness-of-fit test compares the observed and expected frequencies in each category to test either that all categories contain the same proportion of values or that each category contains a user-specified proportion of values. Examples.
PAGE 507

483 Nonparametric Tests Test Statistics Color of Jelly Bean Chi-Square 1 27.973 df 5 Asymptotic Significance .000 1. 0 Cells .0% low freqs 18.8 expected low... Color of Jelly Bean Observed N Blue Expected N Residual 6 5.7 .3 33 33.9 -.9 Green 9 11.3 -2.3 Yellow 17 17.0 .0 Orange 22 22.6 -.6 Red 26 22.6 3.4 Total 113 Brown Test Statistics Color of Jelly Bean Chi-Square 1 1.041 df Asymptotic Significance 5 .959 1. 0 Cells .0% low freqs 5.7 expected low...
PAGE 508

484 Chapter 34 To Obtain a Chi-Square Test E From the menus choose: Analyze Nonparametric Tests Chi-Square... Figure 34-2 Chi-Square Test dialog box E Select one or more test variables. Each variable produces a separate test. Optionally, you can click Options for descriptive statistics, quartiles, and control of the treatment of missing data. Chi-Square Test Expected Range and Expected Values Expected range. By default, each distinct value of the variable is defined as a category.
PAGE 509

485 Nonparametric Tests Expected values. By default, all categories have equal expected values. Categories can have user-specified expected proportions. Select Values, enter a value greater than 0 for each category of the test variable, and click Add. Each time you add a value, it appears at the bottom of the value list. The order of the values is important; it corresponds to the ascending order of the category values of the test variable.
PAGE 510

486 Chapter 34 NPAR TESTS Command Additional Features (Chi-Square Test) The command language also allows you to: Specify different minimum and maximum values or expected frequencies for different variables (with the CHISQUARE subcommand). Test the same variable against different expected frequencies or use different ranges (with the EXPECTED subcommand). See the SPSS Command Syntax Reference for complete syntax information.
PAGE 511

487 Nonparametric Tests Sample Output Figure 34-4 Binomial Test output Binomial Test Category Coin N Observed Proportion Group 1 Head 30 .75 Group 2 Tail 10 .25 40 1.00 Total Test Proportion .50 Asymptotic Significance (2-tailed) .0031 1. Based on Z Approximation To Obtain a Binomial Test E From the menus choose: Analyze Nonparametric Tests Binomial... Figure 34-5 Binomial Test dialog box E Select one or more numeric test variables.
PAGE 512

488 Chapter 34 Binomial Test Options Figure 34-6 Binomial Test Options dialog box Statistics. You can choose one or both of the following summary statistics: Descriptive. Displays the mean, standard deviation, minimum, maximum, and number of nonmissing cases. Quartiles. Displays values corresponding to the 25th, 50th, and 75th percentiles. Missing Values. Controls the treatment of missing values. Exclude cases test-by-test.
PAGE 513

489 Nonparametric Tests Runs Test The Runs Test procedure tests whether the order of occurrence of two values of a variable is random. A run is a sequence of like observations. A sample with too many or too few runs suggests that the sample is not random. Examples. Suppose that 20 people are polled to find out if they would purchase a product. The assumed randomness of the sample would be seriously questioned if all 20 people were of the same gender.
PAGE 514

490 Chapter 34 To Obtain a Runs Test E From the menus choose: Analyze Nonparametric Tests Runs... Figure 34-8 Runs Test dialog box E Select one or more numeric test variables. Optionally, you can click Options for descriptive statistics, quartiles, and control of the treatment of missing data. Runs Test Cut Point Cut Point. Specifies a cut point to dichotomize the variables that you have chosen. You can use either the observed mean, median, or mode, or a specified value as a cut point.
PAGE 515

491 Nonparametric Tests Runs Test Options Figure 34-9 Runs Test Options dialog box Statistics. You can choose one or both of the following summary statistics: Descriptive. Displays the mean, standard deviation, minimum, maximum, and number of nonmissing cases. Quartiles. Displays values corresponding to the 25th, 50th, and 75th percentiles. Missing Values. Controls the treatment of missing values. Exclude cases test-by-test.
PAGE 516

492 Chapter 34 One-Sample Kolmogorov-Smirnov Test The One-Sample Kolmogorov-Smirnov Test procedure compares the observed cumulative distribution function for a variable with a specified theoretical distribution, which may be normal, uniform, Poisson, or exponential. The Kolmogorov-Smirnov Z is computed from the largest difference (in absolute value) between the observed and theoretical cumulative distribution functions.
PAGE 517

493 Nonparametric Tests Figure 34-10 One-Sample Kolmogorov-Smirnov Test output One-Sample Kolmogorov-Smirnov Test Income N Normal Parameters 20 1,2 Most Extreme Differences Mean 56250.00 Std. Deviation 45146.40 Absolute .170 Positive .170 Negative -.164 Kolmogorov-Smirnov Z Asymptotic Significance (2-tailed) 1. Test Distribution is Normal 2. Calculated from data To Obtain a One-Sample Kolmogorov-Smirnov Test E From the menus choose: Analyze Nonparametric Tests 1-Sample K-S...
PAGE 518

494 Chapter 34 E Select one or more numeric test variables. Each variable produces a separate test. Optionally, you can click Options for descriptive statistics, quartiles, and control of the treatment of missing data. One-Sample Kolmogorov-Smirnov Test Options Figure 34-12 One-Sample K-S Options dialog box Statistics. You can choose one or both of the following summary statistics: Descriptive. Displays the mean, standard deviation, minimum, maximum, and number of nonmissing cases. Quartiles.
PAGE 519

495 Nonparametric Tests Two-Independent-Samples Tests The Two-Independent-Samples Tests procedure compares two groups of cases on one variable. Example. New dental braces have been developed that are intended to be more comfortable, to look better, and to provide more rapid progress in realigning teeth. To find out if the new braces have to be worn as long as the old braces, 10 children are randomly chosen to wear the old braces, and another 10 are chosen to wear the new braces.
PAGE 520

496 Chapter 34 Test Statistics 2 Time Worn in Days Mann-Whitney U 14.000 Wilcoxon W 69.000 Z -2.721 Asymptotic Significance (2-tailed) .007 Exact Significance [2*(1-tailed Sig.)] .005 1 1. Not corrected for ties. 2. Grouping Variable: Type of Braces To Obtain Two-Independent-Samples Tests From the menus choose: Analyze Nonparametric Tests 2 Independent Samples...
PAGE 521

497 Nonparametric Tests Figure 34-14 Two-Independent-Samples Tests dialog box E Select one or more numeric variables. E Select a grouping variable and click Define Groups to split the file into two groups or samples. Two-Independent-Samples Test Types Test Type. Four tests are available to test whether two independent samples (groups) come from the same population. The Mann-Whitney U test is the most popular of the two-independent-samples tests.
PAGE 522

498 Chapter 34 same number of observations, W is the rank sum of the group named first in the Two-Independent-Samples Define Groups dialog box. The Kolmogorov-Smirnov Z test and the Wald-Wolfowitz runs test are more general tests that detect differences in both the locations and the shapes of the distributions. The Kolmogorov-Smirnov test is based on the maximum absolute difference between the observed cumulative distribution functions for both samples.
PAGE 523

499 Nonparametric Tests Two-Independent-Samples Tests Options Figure 34-16 Two-Independent-Samples Options dialog box Statistics. You can choose one or both of the following summary statistics: Descriptive. Displays the mean, standard deviation, minimum, maximum, and the number of nonmissing cases. Quartiles. Displays values corresponding to the 25th, 50th, and 75th percentiles. Missing Values. Controls the treatment of missing values. Exclude cases test-by-test.
PAGE 524

500 Chapter 34 Example. In general, do families receive the asking price when they sell their homes? By applying the Wilcoxon signed-rank test to data for 10 homes, you might learn that seven families receive less than the asking price, one family receives more than the asking price, and two families receive the asking price. Statistics. Mean, standard deviation, minimum, maximum, number of nonmissing cases, and quartiles. Tests: Wilcoxon signed rank, sign, McNemar. Data.
PAGE 525

501 Nonparametric Tests To Obtain Two-Related-Samples Tests From the menus choose: Analyze Nonparametric Tests 2 Related Samples... Figure 34-18 Two-Related-Samples Tests dialog box E Select one or more pairs of variables, as follows: Click each of two variables. The first variable appears in the Current Selections group as Variable 1, and the second appears as Variable 2. After you have selected a pair of variables, click the arrow button to move the pair into the Test Pair(s) list.
PAGE 526

502 Chapter 34 If your data are continuous, use the sign test or the Wilcoxon signed-rank test. The sign test computes the differences between the two variables for all cases and classifies the differences as either positive, negative, or tied. If the two variables are similarly distributed, the number of positive and negative differences will not differ significantly.
PAGE 527

503 Nonparametric Tests Missing Values. Controls the treatment of missing values. Exclude cases test-by-test. When several tests are specified, each test is evaluated separately for missing values. Exclude cases listwise. Cases with missing values for any variable are excluded from all analyses. NPAR TESTS Command Additional Features (Two Related Samples) The command language also allows you to: Test a variable with each variable on a list.
PAGE 528

504 Chapter 34 Figure 34-20 Tests for Several Independent Samples output Ranks N Hours Brand Mean Rank Brand A 10 15.20 Brand B 10 25.50 Brand C 10 5.80 Total 30 Test Statistics 1,2 Hours Chi-Square 25.061 df Asymptotic Significance 2 .000 1. Kruskal Wallis Test 2. Grouping Variable: Brand To Obtain Tests for Several Independent Samples From the menus choose: Analyze Nonparametric Tests K Independent Samples...
PAGE 529

505 Nonparametric Tests Figure 34-21 Tests for Several Independent Samples dialog box E Select one or more numeric variables. E Select a grouping variable and click Define Range to specify minimum and maximum integer values for the grouping variable. Tests for Several Independent Samples Test Types Three tests are available to determine if several independent samples come from the same population.
PAGE 530

506 Chapter 34 of the response increases. Here the alternative hypothesis is ordered; therefore, Jonckheere-Terpstra is the most appropriate test to use. The Jonckheere-Terpstra test is available only if you have installed SPSS Exact Tests. Tests for Several Independent Samples Define Range Figure 34-22 Several Independent Samples Define dialog box To define the range, enter integer values for minimum and maximum that correspond to the lowest and highest categories of the grouping variable.
PAGE 531

507 Nonparametric Tests Descriptive. Displays the mean, standard deviation, minimum, maximum, and the number of nonmissing cases. Quartiles. Displays values corresponding to the 25th, 50th, and 75th percentiles. Missing Values. Controls the treatment of missing values. Exclude cases test-by-test. When several tests are specified, each test is evaluated separately for missing values. Exclude cases listwise. Cases with missing values for any variable are excluded from all analyses.
PAGE 532

508 Chapter 34 Figure 34-24 Tests for Several Related Samples output Ranks Mean Rank Doctor 1.50 Lawyer 2.50 Police 3.40 Teacher 2.60 Test Statistics 1 N Chi-Square df Asymptotic Significance 1. Friedman Test To Obtain Tests for Several Related Samples From the menus choose: Analyze Nonparametric Tests K Related Samples... 10 10.920 3 .
PAGE 533

509 Nonparametric Tests Figure 34-25 Tests for Several Related Samples dialog box E Select two or more numeric test variables. Tests for Several Related Samples Test Types Three tests are available to compare the distributions of several related variables. The Friedman test is the nonparametric equivalent of a one-sample repeated measures design or a two-way analysis of variance with one observation per cell. Friedman tests the null hypothesis that k related variables come from the same population.
PAGE 534

510 Chapter 34 Tests for Several Related Samples Statistics Figure 34-26 Several Related Samples Statistics dialog box Descriptive. Displays the mean, standard deviation, minimum, maximum, and the number of nonmissing cases. Quartiles. Displays values corresponding to the 25th, 50th, and 75th percentiles. NPAR TESTS Command Additional Features (K Related Samples) See the SPSS Command Syntax Reference for complete syntax information.
PAGE 535

Chapter Multiple Response Analysis 35 Two procedures are available for analyzing multiple dichotomy and multiple category sets. The Multiple Response Frequencies procedure displays frequency tables. The Multiple Response Crosstabs procedure displays two- and three-dimensional crosstabulations. Before using either procedure, you must define multiple response sets. Example. This example illustrates the use of multiple response items in a market research survey.
PAGE 536

512 Chapter 35 so on. If a given passenger circles American and TWA, the first variable has a code of 1, the second has a code of 3, and the third has a missing-value code. Another passenger might have circled American and entered Delta. Thus, the first variable has a code of 1, the second has a code of 5, and the third a missing-value code. If you use the multiple dichotomy method, on the other hand, you end up with 14 separate variables.
PAGE 537

513 Multiple Response Analysis To Define Multiple Response Sets E From the menus choose: Analyze Multiple Response Define Sets... Figure 35-1 Define Multiple Response Sets dialog box E Select two or more variables. E If your variables are coded as dichotomies, indicate which value you want to have counted. If your variables are coded as categories, define the range of the categories. E Enter a unique name for each multiple response set.
PAGE 538

514 Chapter 35 For multiple dichotomy sets, category names shown in the output come from variable labels defined for elementary variables in the group. If the variable labels are not defined, variable names are used as labels. For multiple category sets, category labels come from the value labels of the first variable in the group. If categories missing for the first variable are present for other variables in the group, define a value label for the missing categories. Missing Values.
PAGE 539

515 Multiple Response Analysis Statistics. Frequency tables displaying counts, percentages of responses, percentages of cases, number of valid cases, and number of missing cases. Data. Use multiple response sets. Assumptions. The counts and percentages provide a useful description for data from any distribution. Related procedures. The Multiple Response Define Sets procedure allows you to define multiple response sets.
PAGE 540

516 Chapter 35 E Select one or more multiple response sets. Multiple Response Crosstabs The Multiple Response Crosstabs procedure crosstabulates defined multiple response sets, elementary variables, or a combination. You can also obtain cell percentages based on cases or responses, modify the handling of missing values, or get paired crosstabulations. You must first define one or more multiple response sets (see “To Define Multiple Response Sets”).
PAGE 541

517 Multiple Response Analysis Figure 35-4 Multiple Response Crosstabs output To Obtain Multiple Response Crosstabs E From the menus choose: Analyze Multiple Response Crosstabs... Figure 35-5 Multiple Response Crosstabs dialog box E Select one or more numeric variables or multiple response sets for each dimension of the crosstabulation. E Define the range of each elementary variable. Optionally, you can obtain a two-way crosstabulation for each category of a control variable or multiple response set.
PAGE 542

518 Chapter 35 Multiple Response Crosstabs Define Ranges Figure 35-6 Multiple Response Crosstabs Define Variable Range dialog box Value ranges must be defined for any elementary variable in the crosstabulation. Enter the integer minimum and maximum category values that you want to tabulate. Categories outside the range are excluded from analysis. Values within the inclusive range are assumed to be integers (non-integers are truncated).
PAGE 543

519 Multiple Response Analysis of responses is equal to the number of counted values across cases. For multiple category sets, the number of responses is the number of values in the defined range. Missing Values. You can choose one or both of the following: Exclude cases listwise within dichotomies. Excludes cases with missing values for any variable from the tabulation of the multiple dichotomy set. This applies only to multiple response sets defined as dichotomy sets.
PAGE 544
PAGE 545

Chapter Reporting Results 36 Case listings and descriptive statistics are basic tools for studying and presenting data. You can obtain case listings with the Data Editor or the Summarize procedure, frequency counts and descriptive statistics with the Frequencies procedure, and subpopulation statistics with the Means procedure. Each of these uses a format designed to make information clear.
PAGE 546

522 Chapter 36 meaningful categories. Individual values of each break variable appear, sorted, in a separate column to the left of all data columns. Report. Controls overall report characteristics, including overall summary statistics, display of missing values, page numbering, and titles. Display cases. Displays the actual values (or value labels) of the data-column variables for every case. This produces a listing report, which can be much longer than a summary report. Preview.
PAGE 547

523 Reporting Results To Obtain a Summary Report: Summaries in Rows E From the menus choose: Analyze Reports Report Summaries in Rows... E Select one or more variables for Data Columns. One column in the report is generated for each variable selected. E For reports sorted and displayed by subgroups, select one or more variables for Break Columns.
PAGE 548

524 Chapter 36 Report Data Column/Break Format The Format dialog boxes control column titles, column width, text alignment, and the display of data values or value labels. Data Column Format controls the format of data columns on the right side of the report page. Break Format controls the format of break columns on the left side. Figure 36-3 Report Data Column Format dialog box Column Title. For the selected variable, controls the column title. Long titles are automatically wrapped within the column.
PAGE 549

525 Reporting Results Report Summary Lines for/Final Summary Lines The two Summary Lines dialog boxes control the display of summary statistics for break groups and for the entire report. Summary Lines controls subgroup statistics for each category defined by the break variable(s). Final Summary Lines controls overall statistics, displayed at the end of the report.
PAGE 550

526 Chapter 36 Page Control. Controls spacing and pagination for categories of the selected break variable. You can specify a number of blank lines between break categories or start each break category on a new page. Blank Lines before Summaries. Controls the number of blank lines between break category labels or data and summary statistics.
PAGE 551

527 Reporting Results Figure 36-7 Report Layout dialog box Page Layout. Controls the page margins expressed in lines (top and bottom) and characters (left and right) and report alignment within the margins. Page Titles and Footers. Controls the number of lines that separate page titles and footers from the body of the report. Break Columns. Controls the display of break columns. If multiple break variables are specified, they can be in separate columns or in the first column.
PAGE 552

528 Chapter 36 Report Titles Report Titles controls the content and placement of report titles and footers. You can specify up to 10 lines of page titles and up to 10 lines of page footers, with left-justified, centered, and right-justified components on each line. Figure 36-8 Report Titles dialog box If you insert variables into titles or footers, the current value label or value of the variable is displayed in the title or footer.
PAGE 553

529 Reporting Results Report Summaries in Columns Report Summaries in Columns produces summary reports in which different summary statistics appear in separate columns. Example. A company with a chain of retail stores keeps records of employee information, including salary, job tenure, and the division in which each employee works. You could generate a report that provides summary salary statistics (for example, mean, minimum, maximum) for each division. Data Columns.
PAGE 554

530 Chapter 36 Sample Output Figure 36-9 Summary report with summary statistics in columns To Obtain a Summary Report: Summaries in Columns E From the menus choose: Analyze Reports Report Summaries in Columns... E Select one or more variables for Data Columns. One column in the report is generated for each variable selected. E To change the summary measure for a variable, select the variable in the Data Columns list and click Summary.
PAGE 555

531 Reporting Results Figure 36-10 Report Summaries in Columns dialog box Data Columns Summary Function Summary Lines controls the summary statistic displayed for the selected data column variable.
PAGE 556

532 Chapter 36 Available summary statistics are sum, mean, minimum, maximum, number of cases, percentage of case above or below a specified value, percentage of cases within a specified range of values, standard deviation, variance, kurtosis, and skewness. Data Columns Summary for Total Column Summary Column controls the total summary statistics that summarize two or more data columns.
PAGE 557

533 Reporting Results 1st column / 2nd column. The total column is the quotient of the columns in the Summary Column list. The Summary Column list must contain exactly two columns. % 1st column / 2nd column. The total column is the first column’s percentage of the second column in the Summary Column list. The Summary Column list must contain exactly two columns. Product of columns. The total column is the product of the columns in the Summary Column list.
PAGE 558

534 Chapter 36 Blank Lines before Subtotal. Controls the number of blank lines between break category data and subtotals. Report Summaries in Columns Options Options controls the display of grand totals, the display of missing values, and pagination in column summary reports. Figure 36-14 Report Options dialog box Grand Total. Displays and labels a grand total for each column; displayed at the bottom of the column. Missing values.
PAGE 559

535 Reporting Results Insert summary lines into data columns for variables other than the data column variable, or for various combinations (composite functions) of summary functions. Use Median, Mode, Frequency, and Percent as summary functions. Control more precisely the display format of summary statistics. Insert blank lines at various points in reports. Insert blank lines after every nth case in listing reports.
PAGE 560
PAGE 561

Chapter Reliability Analysis 37 Reliability analysis allows you to study the properties of measurement scales and the items that make them up. The Reliability Analysis procedure calculates a number of commonly used measures of scale reliability and also provides information about the relationships between individual items in the scale. Intraclass correlation coefficients can be used to compute interrater reliability estimates. Example.
PAGE 562

538 Chapter 37 Assumptions. Observations should be independent, and errors should be uncorrelated between items. Each pair of items should have a bivariate normal distribution. Scales should be additive, so that each item is linearly related to the total score. Related procedures. If you want to explore the dimensionality of your scale items (to see if more than one construct is needed to account for the pattern of item scores), use Factor Analysis or Multidimensional Scaling.
PAGE 563

539 Reliability Analysis Reliability Analysis Statistics Figure 37-2 Reliability Analysis Statistics dialog box You can select various statistics describing your scale and items. Statistics reported by default include the number of cases, the number of items, and reliability estimates as follows: Alpha models: Coefficient alpha. For dichotomous data, this is equivalent to the Kuder-Richardson 20 (KR20) coefficient.
PAGE 564

540 Chapter 37 Descriptives for. Produces descriptive statistics for scales or items across cases. Available options are Item, Scale, and Scale if item deleted. Scale if item deleted. Displays summary statistics comparing each item to the scale composed of the other items. Statistics include scale mean and variance if the item were deleted from the scale, correlation between the item and the scale composed of other items, and Cronbach’s alpha if the item were deleted from the scale. Summaries.
PAGE 565

541 Reliability Analysis Hotelling’s T-square. Produces a multivariate test of the null hypothesis that all items on the scale have the same mean. Tukey’s test of additivity. Produces a test of the assumption that there is no multiplicative interaction among the items. Intraclass correlation coefficient. Produces measures of consistency or agreement of values within cases. Model. Select the model for calculating the intraclass correlation coefficient.
PAGE 566
PAGE 567

Chapter Multidimensional Scaling 38 Multidimensional scaling attempts to find the structure in a set of distance measures between objects or cases. This is accomplished by assigning observations to specific locations in a conceptual space (usually two- or three-dimensional) such that the distances between points in the space match the given dissimilarities as closely as possible. In many cases, the dimensions of this conceptual space can be interpreted and used to further understand your data.
PAGE 568

544 Chapter 38 issue—differences in scaling may affect your solution. If your variables have large differences in scaling (for example, one variable is measured in dollars and the other is measured in years), you should consider standardizing them (this can be done automatically by the Multidimensional Scaling procedure). Assumptions. The Multidimensional Scaling procedure is relatively free of distributional assumptions.
PAGE 569

545 Multidimensional Scaling E If your data are distances, you must select at least four numeric variables for analysis, and you can click Shape to indicate the shape of the distance matrix. E If you want SPSS to create the distances before analyzing them, you must select at least one numeric variable, and you can click Measure to specify the type of distance measure you want.
PAGE 570

546 Chapter 38 Multidimensional Scaling Create Measure Figure 38-3 Multidimensional Scaling Create Measure from Data dialog box Multidimensional scaling uses dissimilarity data to create a scaling solution. If your data are multivariate data (values of measured variables), you must create dissimilarity data in order to compute a multidimensional scaling solution. You can specify the details of creating dissimilarity measures from your data. Measure.
PAGE 571

547 Multidimensional Scaling Transform Values. In certain cases, such as when variables are measured on very different scales, you may want to standardize values before computing proximities (not applicable to binary data). Select a standardization method from the Standardize drop-down list (if no standardization is required, select None).
PAGE 572

548 Chapter 38 Scaling Model. Allows you to specify the assumptions by which the scaling is performed. Available alternatives are Euclidean distance or Individual differences Euclidean distance (also known as INDSCAL). For the Individual differences Euclidean distance model, you can select Allow negative subject weights, if appropriate for your data.
PAGE 573

549 Multidimensional Scaling ALSCAL Command Additional Features The SPSS command language also allows you to: Use three additional model types, known as ASCAL, AINDS, and GEMSCAL in the literature on multidimensional scaling. Carry out polynomial transformations on interval and ratio data. Analyze similarities (rather than distances) with ordinal data. Analyze nominal data. Save various coordinate and weight matrices into files and read them back in for analysis.
PAGE 574
PAGE 575

Chapter 39 Ratio Statistics The Ratio Statistics procedure provides a comprehensive list of summary statistics for describing the ratio between two scale variables. You can sort the output by values of a grouping variable in ascending or descending order. The ratio statistics report can be suppressed in the output and the results saved to an external file. Example.
PAGE 576

552 Chapter 39 Figure 39-1 Ratio Statistics dialog box E Select a numerator variable. E Select a denominator variable. Optionally, you can: Select a grouping variable and specify the ordering of the groups in the results. Choose whether or not to display the results in the Output Viewer. Choose whether or not to save the results to an external file for later use, and specify the name of the file to which the results are saved.
PAGE 577

553 Ratio Statistics Ratio Statistics Figure 39-2 Statistics dialog box Central Tendency. Measures of central tendency are statistics that describe the distribution of ratios. Median. The value such that the number of ratios less than this value and the number of ratios greater than this value are the same. Mean. The result of summing the ratios and dividing the result by the total number of ratios. Weighted mean.
PAGE 578

554 Chapter 39 AAD. The average absolute deviation is the result of summing the absolute deviations of the ratios about the median and dividing the result by the total number of ratios. COD. The coefficient of dispersion is the result of expressing the average absolute deviation as a percentage of the median. PRD. The price-related differential, also known as the index of regressivity, is the result of dividing the mean by the weighted mean. Median centered COV.
PAGE 579

Chapter Overview of the Chart Facility 40 High-resolution charts and plots are created by the procedures on the Graphs menu and by many of the procedures on the Analyze menu. This chapter provides an overview of the chart facility. Creating and Modifying a Chart Before you can create a chart, you need to have your data in the Data Editor. You can enter the data directly into the Data Editor, open a previously saved data file, or read a spreadsheet, tab-delimited data file, or database file.
PAGE 580

556 Chapter 40 Figure 40-1 Chart dialog box The dialog box contains icons for various types of charts and a list of data structures. Click Define to open a chart definition dialog box such as the following one.
PAGE 581

557 Overview of the Chart Facility Figure 40-2 Chart definition dialog box In this dialog box, you can select the variables appropriate for the chart and choose the options you want. For information about the various choices, click Help. The Chart is displayed in the Viewer.
PAGE 582

558 Chapter 40 Figure 40-3 Chart in Viewer Modifying the Chart To modify a chart, double-click anywhere on the chart that is displayed in the Viewer. This displays the chart in the Chart Editor.
PAGE 583

559 Overview of the Chart Facility Figure 40-4 Original chart in the Chart Editor You can modify any part of the chart or change to another type of chart illustrating the same data. You add items or show or hide them using the menus in the Chart Editor. To modify a chart item: E Select the item that you want to modify. E From the menus choose: Edit Properties... This opens the Properties window. The tabs that appear in the Properties window are specific to your selection.
PAGE 584

560 Chapter 40 Figure 40-5 Properties window Some typical modifications include the following: Edit text in the chart. Change the color and fill pattern of the bars. Add text to the chart, such as a title or an annotation. Change the location of the bar origin line. Change the outer frame’s border from transparent to black. Following is a modified chart.
PAGE 585

561 Overview of the Chart Facility Figure 40-6 Modified chart Chart modifications are saved when you close the chart window, and the modified chart is displayed in the Viewer. Chart Definition Options When you are defining a chart, the specific chart definition dialog box usually contains the Titles and Options buttons and a Template group. These global options are available for most charts, regardless of type.
PAGE 586

562 Chapter 40 Figure 40-7 A chart definition dialog box Click Titles to specify titles, subtitles, and footnotes. You can click Options to control various chart options, such as the treatment of missing values or the display of error bars. The specific options that are available depend on the chart type. Additionally, you can apply a template of previously selected attributes either when you are defining the chart or after the chart has been created.
PAGE 587

563 Overview of the Chart Facility Titles, Subtitles, and Footnotes In any chart, you can define two title lines, one subtitle line, and two footnote lines as part of your original chart definition. To specify titles or footnotes while defining a chart, click Titles in the chart definition dialog box. This opens the Titles dialog box. Figure 40-8 Titles dialog box Each line can be up to 72 characters long. The number of characters that will actually fit in the chart depends upon the font and size.
PAGE 588

564 Chapter 40 or counts. The normal curve display option and bin options are available only for population pyramids showing the distribution of a scale variable. The plot shape display option is available only for dot plots. Missing Values If you selected summaries of separate variables for a categorical chart or if you are creating a scatterplot, you can choose one of the following alternatives for exclusion of cases having missing values: Exclude cases listwise.
PAGE 589

565 Overview of the Chart Facility Figure 40-10 Variable-by-variable exclusion of missing values The charts were created from a version of the Employee data.sav file that was edited to have some system-missing (blank) values in the variables for current salary and job category. In some other cases, the value 0 was entered and defined as missing. For both charts, the option Display groups defined by missing values is selected, which adds the category Missing to the other job categories displayed.
PAGE 590

566 Chapter 40 “missing” category to the set of markers. If there are no missing values, the “missing” category is not displayed. If you select this option and want to suppress display after the chart is drawn, select the chart and then choose Properties from the Edit menu. Use the Categories tab to move the categories you want suppressed to the Excluded list. This option is not available for an overlay scatterplot or for single-series charts in which the data are summarized by separate variables.
PAGE 591

567 Overview of the Chart Facility Normal Curve and Bin Options If you are creating a population pyramid and the Show Distribution over variable is a scale variable, you can choose to display a normal curve or change the way cases are binned in the chart: Display normal curve. Superimpose over each half of the population pyramid a normal curve with the same mean and standard deviation as the data. Anchor First Bin. Specify the starting value of the first bin.
PAGE 592

568 Chapter 40 To apply a template to a chart already in the Chart Editor, from the menus choose: File Apply Chart Template... This opens a standard file selection dialog box. Select a file to use as a template. If you are creating a new chart, the filename you select is displayed in the Template group when you return to the chart definition dialog box. A template is used to borrow the format from one chart and apply it to the new chart you are generating.
PAGE 593

Chapter 41 ROC Curves This procedure is a useful way to evaluate the performance of classification schemes in which there is one variable with two categories by which subjects are classified. Example. It is in a bank’s interest to correctly classify customers into those who will and will not default on their loans, so special methods are developed for making these decisions. ROC curves can be used to evaluate how well these methods perform. Statistics.
PAGE 594

570 Chapter 41 Figure 41-1 ROC Curve output Case Processing Summary ACTUAL Positive1 Valid N (listwise) 74 Negative 76 Larger values of the test result variable(s) indicate stronger evidence for a positive actual state. 1. The positive actual state is 1.00. Area Under the Curve Test Result Variable(s): PROBS Asymptotic 95% Confidence Interval Area .877 1 2 Std. Error Asymptotic Sig. Lower Bound .028 .000 .823 1. Under the nonparametric assumption 2. Null hypothesis: true area = 0.5 Upper Bound .
PAGE 595

571 ROC Curves To Obtain an ROC Curve E From the menus choose: Graphs ROC Curve... Figure 41-2 ROC Curve dialog box E Select one or more test probability variables. E Select one state variable. E Identify the positive value for the state variable.
PAGE 596

572 Chapter 41 ROC Curve Options Figure 41-3 ROC Curve Options dialog box You can specify the following options for your ROC analysis: Classification. Allows you to specify whether the cutoff value should be included or excluded when making a positive classification. This currently has no effect on the output. Test Direction. Allows you to specify the direction of the scale in relation to the positive category. Parameters for Standard Error of Area.
PAGE 597

Chapter 42 Utilities This chapter describes the functions found on the Utilities menu and the ability to reorder target variable lists using the Windows system menus. Variable Information The Variables dialog box displays variable definition information for the currently selected variable, including: Data format Variable label User-missing values Value labels Figure 42-1 Variables dialog box Go To. Goes to the selected variable in the Data Editor window.
PAGE 598

574 Chapter 42 Paste. Pastes the selected variables into the designated syntax window at the cursor location. To modify variable definitions, use the Variable view in the Data Editor. To Obtain Variable Information E From the menus choose: Utilities Variables... E Select the variable for which you want to display variable definition information. Data File Comments You can include descriptive comments with a data file. For SPSS-format data files, these comments are saved with the data file.
PAGE 599

575 Utilities Variable Sets You can restrict the variables that appear on dialog box source variable lists by defining and using variable sets. This is particularly useful for data files with a large number of variables. Small variable sets make it easier to find and select the variables for your analysis and can also enhance performance.
PAGE 600

576 Chapter 42 Variables in Set. Any combination of numeric, short string, and long string variables can be included in a set. The order of variables in the set has no effect on the display order of the variables on dialog box source lists. A variable can belong to multiple sets. To Define Variable Sets E From the menus choose: Utilities Define Sets... E Select the variables that you want to include in the set. E Enter a name for the set (up to 12 characters). E Click Add Set.
PAGE 601

577 Utilities Sets in Use. Displays the sets used to produce the source variable lists in dialog boxes. Variables appear on the source lists in alphabetical or file order. The order of sets and the order of variables within a set have no effect on source list variable order. By default, two system-defined sets are in use: ALLVARIABLES. This set contains all variables in the data file, including new variables created during a session. NEWVARIABLES.
PAGE 602

578 Chapter 42 Figure 42-4 Windows system menu with target list reordering Move Selection Up. Moves the selected variable(s) up one position on the target list. Move Selection Down. Moves the selected variable(s) down one position on the target list. You can move multiple variables simultaneously if they are contiguous (grouped together). You cannot move noncontiguous groups of variables.
PAGE 603

Chapter 43 Options Options control a wide variety of settings, including: Session journal, which keeps a record of all commands run in every session. Display order for variables in dialog box source lists. Items displayed and hidden in new output results. TableLook for new pivot tables and ChartLook for new interactive charts. Custom currency formats. Autoscript files and autoscript functions to customize output. To Change Options Settings E From the menus choose: Edit Options...
PAGE 604

580 Chapter 43 General Options Figure 43-1 Options dialog box, General tab Variable Lists. Controls display of variables in dialog box list boxes. You can display variable names or variable labels. Names or labels can be displayed in alphabetical order or in file order, which is the order in which they actually occur in the data file (and are displayed in the Data Editor window). Display order affects only source variable lists.
PAGE 605

581 Options Temporary directory. Controls the location of temporary files created during a session. In distributed mode (available with the server version), this does not affect the location of temporary data files. In distributed mode, the location of temporary data files is controlled by the environment variable SPSSTMPDIR, which can be set only on the computer running the server version of the software. If you need to change the location of the temporary directory, contact your system administrator.
PAGE 606

582 Chapter 43 Viewer Options Viewer output display options affect only new output produced after you change the settings. Output already displayed in the Viewer is not affected by changes in these settings. Figure 43-2 Options dialog box, Viewer tab Initial Output State. Controls which items are automatically displayed or hidden each time you run a procedure and how items are initially aligned.
PAGE 607

583 Options Title Font. Controls the font style, size, and color for new output titles. Page Title Font. Controls the font style, size, and color for new page titles and page titles generated by TITLE and SUBTITLE command syntax or created by New Page Title on the Insert menu. Text Output Page Size. For text output, controls the page width (expressed in number of characters) and page length (expressed in number of lines). For some procedures, some statistics are displayed only in wide format.
PAGE 608

584 Chapter 43 Figure 43-3 Options dialog box, Draft Viewer tab Display Output Items. Controls which items are automatically displayed each time that you run a procedure. You can control the display of the following items: log, warnings, notes, titles, tabular output (pivot tables converted to text output), charts, and text output (space-separated output). You can also turn on or off the display of commands in the log.
PAGE 609

585 Options value in the column. To limit the width of columns and wrap long labels, specify a number of characters for the column width. Note: Tab-separated tabular output will not align properly in the Draft Viewer. This format is useful for copying and pasting results to word-processing applications where you can use any font that you want (not only fixed-pitch fonts) and set the tabs to align output properly. Text Output.
PAGE 610

586 Chapter 43 Figure 43-4 Options dialog box, Output Labels tab Output label options affect only new output produced after you change the settings. Output already displayed in the Viewer is not affected by changes in these settings. These settings affect only pivot table output. Text output is not affected by these settings.
PAGE 611

587 Options Chart Options Figure 43-5 Options dialog box, Charts tab Chart Template. New charts can use either the settings selected here or the settings from a chart template file. Click Browse to select a chart template file. To create a chart template file, create a chart with the attributes that you want and save it as a template (choose Save Chart Template from the File menu). Chart Aspect Ratio. The width-to-height ratio of the outer frame of new charts.
PAGE 612

588 Chapter 43 Style Cycle Preference. The initial assignment of colors and patterns for new charts. Cycle through colors, then patterns uses the default palette of colors and then changes the line style or the marker symbol or adds a fill pattern as necessary. Cycle through colors only uses only colors to differentiate chart elements and does not use patterns. Cycle through patterns only uses only line styles, marker symbols, or fill patterns to differentiate chart elements and does not use color. Frame.
PAGE 613

589 Options Remove a selected category. Reset the sequence to the default sequence. Edit a color by selecting its well and then clicking Edit. Data Element Lines Specify the order in which styles should be used for the line data elements in your new chart. Line styles are used whenever your chart includes line data elements and you select a choice that includes patterns in the Style Cycle Preference group in the main Chart Options dialog box.
PAGE 614

590 Chapter 43 For example, if you create a scatterplot chart with two groups and you select Cycle through patterns only in the main Chart Options dialog box, the first two symbols in the Grouped Charts list are used as the markers on the new chart. To change the order in which marker styles are used: E Select Simple Charts and then select a marker symbol that is used for charts without categories. E Select Grouped Charts to change the pattern cycle for charts with categories.
PAGE 615

591 Options E Select Grouped Charts to change the pattern cycle for charts with categories. To change a category’s fill pattern, select a category and then select a fill pattern for that category from the palette. Optionally, you can: Insert a new category above the selected category. Move a selected category. Remove a selected category. Reset the sequence to the default sequence.
PAGE 616

592 Chapter 43 For interactive charts (Graphs menu, Interactive submenu), the following options are available: ChartLook. Select a ChartLook from the list of files and click OK or Apply. By default, the list displays the ChartLooks saved in the Looks directory of the directory in which the program is installed. You can use one of the ChartLooks provided with the program, or you can create your own in the Interactive Graphics Editor (in an activated chart, choose ChartLooks from the Format menu).
PAGE 617

593 Options Pivot Table Options Pivot Table options sets the default TableLook used for new pivot table output. TableLooks can control a variety of pivot table attributes, including the display and width of grid lines; font style, size, and color; and background colors. Figure 43-7 Options dialog box, Pivot Tables tab TableLook. Select a TableLook from the list of files and click OK or Apply.
PAGE 618

594 Chapter 43 Labels only. Adjusts column width to the width of the column label. This produces more compact tables, but data values wider than the label will not be displayed (asterisks indicate values too wide to be displayed). Labels and data. Adjusts column width to whichever is larger, the column label or the largest data value. This produces wider tables, but it ensures that all values will be displayed. Default Editing Mode.
PAGE 619

595 Options Transformation and Merge Options. Each time the program executes a command, it reads the data file. Some data transformations (such as Compute and Recode) and file transformations (such as Add Variables and Add Cases) do not require a separate pass of the data, and execution of these commands can be delayed until the program reads the data to execute another command, such as a statistical procedure.
PAGE 620

596 Chapter 43 The five custom currency format names are CCA, CCB, CCC, CCD, and CCE. You cannot change the format names or add new ones. To modify a custom currency format, select the format name from the source list and make the changes that you want. Figure 43-9 Options dialog box, Currency tab Prefixes, suffixes, and decimal indicators defined for custom currency formats are for display purposes only. You cannot enter values in the Data Editor using custom currency characters.
PAGE 621

597 Options Script Options Use the Scripts tab to specify your global procedures file and autoscript file, and select the autoscript subroutines that you want to use. You can use scripts to automate many functions, including customizing pivot tables. Global Procedures. A global procedures file is a library of script subroutines and functions that can be called by script files, including autoscript files. Note: The global procedures file that comes with the program is selected by default.
PAGE 622

598 Chapter 43 To Specify Global Procedure File and Autoscript File E Click the Scripts tab. E Select Enable Autoscripting. E Select the autoscript subroutines that you want to enable. You can also specify a different autoscript file or global procedure file.
PAGE 623

Chapter 44 Customizing Menus and Toolbars Menu Editor You can use the Menu Editor to customize your menus. With the Menu Editor you can: Add menu items that run customized scripts. Add menu items that run command syntax files. Add menu items that launch other applications and automatically send data to other applications. You can send data to other applications in the following formats: SPSS, Excel 4.0, Lotus 1-2-3 release 3, SYLK, tab-delimited, and dBASE IV.
PAGE 624

600 Chapter 44 E Click Browse to select a file to attach to the menu item. Figure 44-1 Menu Editor dialog box You can also add entirely new menus and separators between menu items. Optionally, you can automatically send the contents of the Data Editor to another application when you select that application on the menus. Customizing Toolbars You can customize toolbars and create new toolbars. Toolbars can contain any of the available tools, including tools for all menu actions.
PAGE 625

601 Customizing Menus and Toolbars Figure 44-2 Show Toolbars dialog box To Customize Toolbars E From the menus choose: View Toolbars... E Select the toolbar you want to customize and click Customize, or click New Toolbar to create a new toolbar. E For new toolbars, enter a name for the toolbar, select the windows in which you want the toolbar to appear, and click Customize. E Select an item in the Categories list to display available tools in that category.
PAGE 626

602 Chapter 44 E Enter a descriptive label for the tool. E Select the action you want for the tool (open a file, run a command syntax file, or run a script). E Click Browse to select a file or application to associate with the tool. New tools are displayed in the User-Defined category, which also contains user-defined menu items. Toolbar Properties Use Toolbar Properties to select the window types in which you want the selected toolbar to appear.
PAGE 627

603 Customizing Menus and Toolbars E For new toolbars, click New Tool. E Select the window types in which you want the toolbar to appear. For new toolbars, also enter a toolbar name. Customize Toolbar Use the Customize Toolbar dialog box to customize existing toolbars and create new toolbars. Toolbars can contain any of the available tools, including tools for all menu actions. They can also contain custom tools that launch other applications, run command syntax files, or run script files.
PAGE 628

604 Chapter 44 Figure 44-5 Create New Tool dialog box Toolbar Bitmap Editor Use the Bitmap Editor to create custom icons for toolbar buttons. This is particularly useful for custom tools you create to run scripts, syntax, and other applications.
PAGE 629

605 Customizing Menus and Toolbars Figure 44-6 Bitmap Editor To Edit Toolbar Bitmaps E From the menus choose: View Toolbars... E Select the toolbar you want to customize and click Customize. E Click the tool with the bitmap icon you want to edit on the example toolbar. E Click Edit Tool. E Use the toolbox and the color palette to modify the bitmap or create a new bitmap icon.
PAGE 630
PAGE 631

Chapter Production Facility 45 The Production Facility provides the ability to run the program in an automated fashion. The program runs unattended and terminates after executing the last command, so you can perform other tasks while it runs. Production mode is useful if you often run the same set of time-consuming analyses, such as weekly reports. The Production Facility uses command syntax files to tell the program what to do. A command syntax file is a simple text file containing command syntax.
PAGE 632

608 Chapter 45 Figure 45-1 Production Facility Syntax Input Format. Controls the form of the syntax rules used for the job: Interactive. Each command must end with a period. Periods can appear anywhere within the command, and commands can continue on multiple lines, but a period as the last non-blank character on a line is interpreted as the end of the command. Continuation lines and new commands can start anywhere on a new line.
PAGE 633

609 Production Facility Syntax Error Behavior. Controls the treatment of error conditions in the job: Continue. Errors in the job do not automatically stop command processing. The commands in the production job files are treated as part of the normal command stream, and command processing continues in the normal fashion. Stop. Command processing stops when the first error in a production job file is encountered.
PAGE 634

610 Chapter 45 Export Options Export Options saves pivot tables and text output in HTML, text, Word/RTF, and Excel format, and it saves charts in a variety of common formats used by other applications. Figure 45-2 Export Options dialog box Export This drop-down list specifies what you want to export. Output Document. Exports any combination of pivot tables, text output, and charts. For HTML and text formats, charts are exported in the currently selected chart export format.
PAGE 635

611 Production Facility Export Format For output documents, the available options are HTML, text, Word/RTF, and Excel; for HTML and text formats, charts are exported in the currently selected chart format. For Charts Only, select a chart export format from the drop-down list. For output documents, pivot tables and text are exported in the following manner: HTML file (*.htm). Pivot tables are exported as HTML tables. Text output is exported as preformatted HTML. Text file (*.txt).
PAGE 636

612 Chapter 45 Text and Image Options Text export options (for example, tab-separated or space-separated) and chart export options (for example, color settings, size, and resolution) are set in SPSS and cannot be changed in the Production Facility. Use Export on the File menu in SPSS to change text and chart export options. Draft Viewer Export The only Export option available for Draft Viewer output is to export the output in simple text format. Charts for Draft Viewer output cannot be exported.
PAGE 637

613 Production Facility Macro Symbol. The macro name used in the command syntax file to invoke the macro that prompts the user to enter information. The macro symbol name must begin with an @. Prompt. The descriptive label that is displayed when the production job prompts you to enter information. For example, you could use the phrase “What data file do you want to use?” to identify a field that requires a data filename. Default.
PAGE 638

614 Chapter 45 Production Macro Prompting The Production Facility prompts you for values whenever you run a production job that contains defined macro symbols. You can replace or modify the default values that are displayed. Those values are then substituted for the macro symbols in all command syntax files associated with the production job.
PAGE 639

615 Production Facility Figure 45-6 Options dialog box Changing Production Options From the Production Facility menus choose: Edit Options... Format Control for Production Jobs There are a number of settings in SPSS that can help ensure the best format for pivot tables created in production jobs: TableLooks. By editing and saving TableLooks (Format menu in an activated pivot table), you can control many pivot table attributes. You can specify font sizes and styles, colors, and borders.
PAGE 640

616 Chapter 45 Output labels. Output label options (Edit menu, Options, Output Labels tab) control the display of variable and data value information in pivot tables. You can display variable names and/or defined variable labels, actual data values and/or defined value labels. Descriptive variable and value labels often make it easier to interpret your results; however, long labels can be awkward in some tables. Column width.
PAGE 641

617 Production Facility E Click the Pivot Tables tab. E Select the TableLook from the list and click OK. Setting Options for Production Jobs E From the menus choose: Edit Options... E Select the options that you want. E Click OK. You can set the default TableLook, output label settings, and automatic column width adjustment with Options. Options settings are saved with the program.
PAGE 642

618 Chapter 45 SET TFIT = LABELS adjusts column width to the width of the column label. SET TFIT = BOTH adjusts column width to the width of the column label or the largest data value, whichever is wider. Running Production Jobs from a Command Line Command line switches enable you to schedule production jobs to run at certain times with scheduling utilities like the one available in Microsoft Plus!. You can run production jobs from a command line with the following switches: -r.
PAGE 643

619 Production Facility For command line switches that require additional specifications, the switch must be followed by an equals sign followed immediately by the specification. If the specification contains spaces (such as a two-word server name), enclose the value in quotes or apostrophes, as in: -x="HAL 9000" -u="secret word" Default server.
PAGE 644

620 Chapter 45 Pivot tables are published as dynamic tables that can be manipulated over the Web to obtain different views of the data. Charts are published as JPEG or PNG graphic files. Text output is published as preformatted HTML. (By default, most Web browsers use a fixed-pitch font for preformatted text.) Publish. Allows you to specify the output that you want to publish: Output Document. Publishes the entire output document, including hidden or collapsed items.
PAGE 645

621 Production Facility Note: Publish to Web is available only for sites with SmartViewer Web Server installed and requires a plug-in to activate the publishing feature. Contact your system administrator or Webmaster for instructions on downloading the plug-in. If SmartViewer is unavailable at your site, use Export Output to save output in HTML format. SmartViewer Web Server Login Publishing to SmartViewer Web Server requires a valid SmartViewer Web Server user name (user ID) and password.
PAGE 646
PAGE 647

Chapter SPSS Scripting Facility 46 The scripting facility allows you to automate tasks, including: Automatically customize output in the Viewer. Open and save data files. Display and manipulate dialog boxes. Run data transformations and statistical procedures using command syntax. Export charts as graphic files in a number of formats. A number of scripts are included with the software, including autoscripts that run automatically every time a specific type of output is produced.
PAGE 648

624 Chapter 46 Figure 46-1 Run Script dialog box E Select the Scripts folder. E Select the script you want. For more information, see “Customizing Menus and Toolbars” in Chapter 44 on p. 599. Scripts Included with SPSS The following scripts are included with the program: Analyze held out cases. Repeats a Factor or Discriminant analysis using cases not selected in a previous analysis. A Notes table produced by a previous run of Factor or Discriminant must be selected before running the script.
PAGE 649

625 SPSS Scripting Facility Frequencies footnote. Insert statistics displayed in a Frequencies Statistics table as footnotes in the corresponding frequency table for each variable. The Frequencies Statistics table must be selected before running the script. Make totals bold. Apply the bold format and blue color to any row, column, or layer of data labeled Total in a pivot table. The table must be selected before running the script. Means report.
PAGE 650

626 Chapter 46 Figure 46-2 Scripts tab of Options dialog box Autoscripts are specific to a given procedure and output type. An autoscript that formats the ANOVA tables produced by One-Way ANOVA is not triggered by ANOVA tables produced by other statistical procedures (although you could use global procedures to create separate autoscripts for these other ANOVA tables that shared much of the same code). However, you can have a separate autoscript for each type of output produced by the same procedure.
PAGE 651

627 SPSS Scripting Facility Figure 46-3 Modifying a script in the script window If you prefer to create your own scripts, you can begin by choosing from a number of starter scripts. To Edit a Script E From the menus choose: File Open Script...
PAGE 652

628 Chapter 46 Figure 46-4 Opening a script file E Select the Scripts folder. E Under Files of Type, select SPSS Script (*.sbs). E Select the script you want. If you open more than one script, each opens in its own window. Script Window The script window is a fully featured programming environment that uses the Sax BASIC language and includes a dialog box editor, object browser, debugging features, and context-sensitive Help.
PAGE 653

629 SPSS Scripting Facility Figure 46-5 Script window As you move the cursor, the name of the current procedure is displayed at the top of the window. Terms colored blue are reserved words in BASIC (for example Sub, End Sub, and Dim). You can access context-sensitive Help on these terms by clicking them and pressing F1. Terms colored magenta are objects, properties, or methods.
PAGE 654

630 Chapter 46 Comments are displayed in green. Press F2 at any time to display the object browser, which displays objects, properties, and methods. Script Editor Properties (Script Window) Code elements in the script window are color-coded to make them easier to distinguish. By default, comments are green, Sax BASIC terms are blue, and names of valid objects, properties, and methods are magenta. You can specify different colors for these elements and change the size and font for all text.
PAGE 655

631 SPSS Scripting Facility Starter Scripts When you create a new script, you can begin by choosing from a number of starter scripts. Figure 46-7 Use Starter Script dialog box Each starter script supplies code for one or more common procedures and is commented with hints on how to customize the script to your particular needs. Delete by label. Delete rows or columns in a pivot table based on the contents of the RowLabels or ColumnLabels.
PAGE 656

632 Chapter 46 In addition, you can use any of the other available scripts as starter scripts, although they may not be as easy to customize. Just open the script and save it with a different filename. Creating a Script E From the menus choose: New Script E Select a starter script if you want to begin with one. E If you do not want to use a starter script, click Cancel. Creating Autoscripts You can create an autoscript by starting with the output object that you want to serve as the trigger.
PAGE 657

633 SPSS Scripting Facility By default, each autoscript you create is added to the current autoscript file (autscript.sbs) as a new procedure. The name of the procedure references the event that serves as the trigger. For example, if you create an autoscript triggered whenever Explore creates a Descriptives table, the name of the autoscript subroutine would be Explore_Table_Descriptives_Create.
PAGE 658

634 Chapter 46 Events that Trigger Autoscripts The name of the autoscript procedure references the event that serves as the trigger. The following events can trigger autoscripts: Creation of pivot table. The name of the procedure references both the table type and the procedure that created it—for example, Correlations_Table_Correlations_Create. Figure 46-10 Autoscript procedure for Correlations table Creation of title. Referenced to the statistical procedure that created it: Correlations_Title_Create.
PAGE 659

635 SPSS Scripting Facility You can also use a script to trigger an autoscript indirectly. For example, you could write a script that invokes the Correlations procedure, which in turn triggers the autoscript registered to the resulting Correlations table. Autoscript File All autoscripts are saved in a single file (unlike other scripts, each of which is saved in a separate file). Any new autoscripts you create are also added to this file.
PAGE 660

636 Chapter 46 How Scripts Work Scripts work by manipulating objects using properties and methods. For example, pivot tables are a class of objects. With objects of this class, you can use the SelectTable method to select all of the elements in the table, and you can use the TextColor property to change the color of selected text. Each object class has specific properties and methods associated with it. The collection of all SPSS object classes (or types) is called the SPSS type library.
PAGE 661

637 SPSS Scripting Facility to get a pivot table object, you have to first get the output document that contains the pivot table and then get the items in that output document. Each object that you get is stored in a variable. (Remember that all you are really storing in the variable is a reference to the object.) One of the first steps in creating a script is often to declare variables for the objects that you need.
PAGE 662

638 Chapter 46 methods associated with it. The collection of all SPSS object classes (or types) is referred to as the SPSS type library. Table of Object Classes and Naming Conventions The following variable names are used in the sample scripts included with the program and are recommended for all scripts. Notice that with the exception of pivot tables, object classes have names beginning with ISpss.
PAGE 663

639 SPSS Scripting Facility Getting SPSS Automation Objects (Scripting) To get an object means to create a reference to the object so that you can use properties and methods to do something. Each object reference that you get is stored in a variable. To get an object, first declare an object variable of the appropriate class, then set the variable to the specific object. For example, to get the designated output document: Dim objOutputDoc As ISpssOutputDoc Set objOutputDoc = objSpssApp.
PAGE 664

640 Chapter 46 Example: Getting the First Pivot Table This script gets the first pivot table in the designated output document and activates it. Sub Main Dim objOutputDoc As ISpssOutputDoc 'declare object variables Dim objOutputItems As ISpssItems Dim objOutputItem As ISpssItem Dim objPivotTable As PivotTable Set objOutputDoc = objSpssApp.GetDesignatedOutputDoc'get reference to designated output doc Set objOutputItems = objOutputDoc.
PAGE 665

641 SPSS Scripting Facility Properties and Methods (Scripting) Like real world objects, OLE automation objects have features and uses. In programming terminology, the features are referred to as properties, and the uses are referred to as methods. Each object class has specific methods and properties that determine what you can do with that object.
PAGE 666

642 Chapter 46 or removing a selection: objPivotTable.ClearSelection Some methods return another object. Such methods are extremely important for navigating the object hierarchy. For example, the GetDesignatedOutputDoc method returns the designated output document, allowing you to access the items in that output document: Set objOutputDoc = objSpssApp.GetDesignatedOutputDoc Set objItems = objOutputDoc.
PAGE 667

643 SPSS Scripting Facility Using the Object Browser E From the script window menus choose: Debug Object Browser... E Select an object class from the Data Type list to display the methods and properties for that class. E Select properties and methods for context-sensitive Help or to paste them into your script. New Procedure (Scripting) A procedure is a named sequence of statements that are executed as a unit. Organizing code in procedures makes it easier to manage and reuse pieces of code.
PAGE 668

644 Chapter 46 To Add a New Procedure in a Script E From the menus choose: Script New Procedure... E Type a name for the procedure. E Select Subroutine or Function. Alternatively, you can create a new procedure by typing the statements that define the procedure directly in the script. Global Procedures (Scripting) If you have a procedure or function that you want to use in a number of different scripts, you can add it to the global script file.
PAGE 669

645 SPSS Scripting Facility Figure 46-15 Global script file The default global script file is global.sbs. You can freely add procedures to this file. You can also specify a different global file on the Scripts tab in the Options dialog box (Edit menu), but only one file can be active as the global file at any given time. That means that if you create a new global file and specify it as the global file, the procedures and functions in global.sbs are no longer available.
PAGE 670

646 Chapter 46 Adding a Description to a Script You can add a description to be displayed in the Run Script and Use Starter Script dialog boxes. Just add a comment on the first line of the script that starts with Begin Description, followed by the desired comment (one or more lines), followed by End Description. For example: 'Begin Description 'This script changes "Sig." to "p=" in the column labels of any pivot table. 'Requirement: The Pivot Table that you want to change must be selected.
PAGE 671

647 SPSS Scripting Facility The Editor initially displays a blank dialog box form. You can add controls, such as radio buttons and check boxes, by selecting the appropriate tool and dragging with the mouse. (Hold the mouse over each tool for a description.) You can also drag the sides and corners to resize the dialog box. After adding a control, right-click the control to set properties for that control. Dialog monitor function.
PAGE 672

648 Chapter 46 Dialog Monitor Functions (Scripting) A dialog monitor function defines the behavior of a dialog box for each of a number of specified cases. The function takes the following (generic) form: Function DialogFunc(strDlgItem as String, intAction as Integer, intSuppValue as Integer) Select Case intAction Case 1 ' dialog box initialization ... 'statements to execute when dialog box is initialized Case 2 ' value changing or button pressed ... 'statements...
PAGE 673

649 SPSS Scripting Facility Case 2. Executes when a button is pushed or when a value changes in a CheckBox, DropListBox, ListBox or OptionGroup control. If a button is pushed, strDlgItem is the button, intSuppValue is meaningless, and you must set DialogFunc = True to prevent the dialog from closing. If a value changes, strDlgItem is the item whose value has changed, and intSuppValue is the new value. Case 3. Executes when a value changes in a TextBox or ComboBox control.
PAGE 674

650 Chapter 46 TextBox 40,28,340,21,.txtFilename OKButton 470,7,100,21,.cmdOK CancelButton 470,35,100,21,.
PAGE 675

651 SPSS Scripting Facility To debug an autoscript, open the autoscript file in a script window, insert break points in the procedure that you want to debug, and then run the statistical procedure that triggers the autoscript. Step Into. Execute the current line. If the current line is a subroutine or function call, stop on the first line of that subroutine or function. Step Over. Execute to the next line. If the current line is a subroutine or function call, execute the subroutine or function completely.
PAGE 676

652 Chapter 46 Debugging Pane (Scripting) When you step through code, the Immediate, Watch, Stack, and Loaded tabs are displayed. Figure 46-18 Debugging pane displayed in script window Immediate tab. Click the name of any variable and click the eyeglass icon to display the current value of the variable. You can also evaluate an expression, assign a variable, or call a subroutine. Type ?expr and press Enter to show the value of expr. Type var = expr and press Enter to change the value of var.
PAGE 677

653 SPSS Scripting Facility Watch tab. To display a variable, function, or expression, click it and choose Add Watch from the Debug menu. Displayed values are updated each time execution pauses. You can edit the expression to the left of ->. Press Enter to update all the values immediately. Press Ctrl-Y to delete the line. Stack tab. Displays the lines that called the current statement. The first line is the current statement, the second line is the one that called the first, and so on.
PAGE 678

654 Chapter 46 Figure 46-19 Pasting command syntax into a script When you open dialog boxes using the script window menus, the Paste button pastes all of the code needed to run commands from within a script. Note: You must use the script window menus to open the dialog box; otherwise, commands will be pasted to a syntax window rather than the scripting window.
PAGE 679

655 SPSS Scripting Facility E Click Paste. Note: You must use the script window menus to open the dialog box; otherwise, commands will be pasted to a syntax window rather than the scripting window. Running a Script from Command Syntax You can use the SCRIPT command to run a script from within command syntax. Specify the name of the script you want to run, with the filename enclosed in quotes, as follows: SCRIPT 'C:\PROGRAM FILES\SPSS\CLEAN NAVIGATOR.SBS'.
PAGE 680
PAGE 681

Chapter Output Management System 47 The Output Management System (OMS) provides the ability to automatically write selected categories of output to different output files in different formats. Formats include: SPSS data file format (.sav). Output that would be displayed in pivot tables in the Viewer can be written out in the form of an SPSS data file, making it possible to use output as input for subsequent commands. XML. Tables, text output, and even many charts can be written out in XML format.
PAGE 682

658 Chapter 47 Figure 47-1 Output Management System Control Panel You can use the control panel to both start and stop the routing of output to various destinations. Each OMS request remains active until explicitly ended or the end of the session. A destination file specified on an OMS request is unavailable to other SPSS procedures and other applications until the OMS request is ended.
PAGE 683

659 Output Management System The order of the output objects in any particular destination is the order in which they were created, which is determined by the order and operation of the procedures that generate the output. OMS cannot route charts or warnings objects created by interactive graphics procedures (Graphs menu, Interactive submenu) or maps created by the mapping procedures (Graphs menu, Maps submenu).
PAGE 684

660 Chapter 47 Optionally, you can also: Exclude the selected output from the Viewer. If you select Exclude from viewer, the output types in the OMS request will not be displayed in the Viewer window. If multiple active OMS requests include the same output types, the display of those output types in the Viewer is determined by the most recent OMS request that contains those output types. For more information, see “Excluding Output Display from the Viewer” on p. 671.
PAGE 685

661 Output Management System E Click End. To end all active OMS requests: E Click End All. To delete a new request (one that has been added but is not yet active): E Click any cell in the row for that request in the Requests list. E Click Delete. Note: Active OMS requests are not ended until you click OK. Output Object Types There are seven different types of output objects: Charts. Charts (except “interactive” charts and maps). Chart objects are only included with XML and HTML destination formats.
PAGE 686

662 Chapter 47 Figure 47-2 Output Object Types
PAGE 687

663 Output Management System Command Identifiers and Table Subtypes Command Identifiers Command identifiers are available for all statistical and charting procedures and any other commands that produce blocks of output with their own identifiable heading in the outline pane of the Viewer. These identifiers are usually (but not always) the same or similar to the procedure names on the menus and dialog box titles, which are usually (but not always) similar to the underlying SPSS command names.
PAGE 688

664 Chapter 47 E From the pop-up context menu, select Copy OMS Command Identifier or Copy OMS Table Subtype. E Paste the copied command identifier or table subtype name into any text editor (such as an SPSS syntax window). Table Labels As an alternative to table subtype names you can select tables based on the text displayed in the outline pane of the Viewer.
PAGE 689

665 Output Management System To specify labels to use to identify output tables: E In the Output Management System Control Panel, select Tables in the Output Types list (you can also select other Output Types, but Tables must be one of the selected types) and select one or more commands. Click Table Labels. Figure 47-3 Table Labels dialog box E Enter the table label exactly as it appears in the outline pane of the Viewer window.
PAGE 690

666 Chapter 47 OMS Options You can use the OMS Options dialog box to: Specify the output format. Include or exclude chart and tree model diagram output and specify the graphic format. Specify what table dimension elements should go in the row dimension. For SPSS data file format, include a variable that identifies the sequential table number that is the source for each case. To specify OMS options: E Click Options in the Output Management System Control Panel.

PAGE 691

667 Output Management System HTML. Output objects that would be pivot tables in the Viewer are converted to simple HTML tables. No TableLook attributes (font characteristics, border styles, colors, and so on) are supported. Text output objects are tagged

in the HTML. If you choose to include charts, they are exported as separate files in the selected graphics format and are embedded by reference (

) in the HTML document. SPSS Data File. This is a binary file format.

PAGE 692

668 Chapter 47 Table Pivots For pivot table output, you can specify the dimension element(s) that should appear in the columns. All other dimension elements appear in the rows. For SPSS data file format, table columns become variables, and rows become cases. If you specify multiple dimension elements for the columns, they are nested in the columns in the order in which they are listed. For SPSS data file format, variable names are constructed by nested column elements.
PAGE 693

669 Output Management System Figure 47-5 Row and column positional arguments List of dimension names. As an alternative to positional arguments, you can use dimension element “names” which are the text labels that appears in the table.
PAGE 694

670 Chapter 47 E From the menus choose: View Show All and/or E If the pivoting trays aren’t displayed, from the menus choose: Pivot Pivoting Trays E Hover over each icon in the pivoting trays for a ToolTip pop-up that displays the label.
PAGE 695

671 Output Management System Logging You can record OMS activity in a log in XML or text format. The log tracks all new OMS requests for the session; it does not include OMS requests that were already active before you requested a log. The current log file ends if you specify a new log file or if you deselect (uncheck) Log OMS activity. To specify OMS logging: E Click Logging in the Output Management System Control Panel.
PAGE 696

672 Chapter 47 Routing Output to SPSS Data Files An SPSS data file consists of variables in the columns and cases in the rows, and that’s essentially how pivot tables are converted to data files: Columns in the table are variables in the data file. Valid variable names are constructed from the column labels. Row labels in the table become variables with generic variable names (Var1, Var2, Var3, and so on) in the data file. The values of these variables are the row labels in the table.
PAGE 697

673 Output Management System Figure 47-7 Single two-dimensional table The first three variables identify the source table by command, subtype, and label. The two elements that defined the rows in the table—values of the variable Gender and statistical measures are assigned the generic variable names Var1 and Var2. These are both string variables. The column labels from the table are used to create valid variable names.
PAGE 698

674 Chapter 47 Example: Tables with Layers In addition to rows and columns, a table can also contain a third dimension: the layer dimension. Figure 47-8 Table with layers In the table, the variable labeled Minority Classification defines the layers. In the data file, this creates two additional variables: one that identifies the layer element and one that identifies the categories of the layer element.
PAGE 699

675 Output Management System Data Files Created from Multiple Tables When multiple tables are routed to the same data file, each table is added to the data file in a fashion similar to merging data files by adding cases from one data file to another (Data menu, Merge Files, Add Cases). Each subsequent table will always add cases to the data file.
PAGE 700

676 Chapter 47 Figure 47-9 Two tables with identical column labels The second table contributes additional cases (rows) to the data file but no new variables because the column labels are exactly the same; so there are no large patches of missing data. Although the values for Command_ and Subtype_ are the same, the Label_ value identifies the source table for each group of cases because the two frequency tables have different titles.
PAGE 701

677 Output Management System Figure 47-10 Two tables with different column labels The first table has columns labeled Beginning Salary and Current Salary, which are not present in the second table, resulting in missing values for those variables for cases from the second table. Conversely, the second table has columns labeled Education level and Months since hire, which are not present in the first table, resulting in missing values for those variables for cases from the first table.
PAGE 702

678 Chapter 47 Example: Data Files Not Created from Multiple Tables If any tables do not have the same number of row elements as the other tables, no data file will be created. The number of rows doesn’t have to be the same; the number of row elements that become variables in the data file must be the same.
PAGE 703

679 Output Management System Since both table types use the element name “Statistics” for the statistics dimension, we can put the statistics from the Frequencies Statistics table in the columns simply by specifying “Statistics” (in quotes) in the list of dimension names in the Table Pivots group in the Options dialog box.
PAGE 704

680 Chapter 47 Figure 47-13 Combining different table types in a data file by pivoting dimension elements Some of the variables will have missing values, since the table structures still aren’t exactly the same with statistics in the columns. Variable Names in OMS-Generated Data Files OMS constructs valid, unique variable names from column labels: Row and layer elements are assigned generic variable names: the prefix Var followed by a sequential number.
PAGE 705

681 Output Management System Characters that aren’t allowed in variable names (for example, space, parentheses) are removed. For example, “This (Column) Label” would become a variable named ThisColumnLabel. If the label begins with a character that is allowed in variable names but not allowed as the first character (for example, a number), “@” is inserted as a prefix. For example “2nd” would become a variable named @2nd.
PAGE 706

682 Chapter 47 OXML Table Structure Output XML (OXML) is XML that conforms to the spss-output schema. For a detailed description of the schema, see SPSSOutputXML_schema.htm in the help\main folder of the SPSS installation folder. OMS command and subtype identifiers are used as values of the command and subType attributes in OXML. For example:
PAGE 707

683 Output Management System The preceding example is a simplified representation of the structure that shows the descendant/ancestor relationships of these elements, but not necessarily the parent/child relationships, since there are typically intervening nested element levels. The following two figures show a simple frequency table and the complete output XML representation of that table. Figure 47-15 Simple frequency table PAGE 708
684 Chapter 47 PAGE 709
685 Output Management System As you may notice, a simple, small table produces a substantial amount of XML. That’s partly because the XML contains some information not readily apparent in the original table, some information that might not even be available in the original table, and a certain amount of redundancy. The table contents as they are (or would be) displayed in a pivot table in the Viewer are contained in text attributes. For example:
PAGE 710

686 Chapter 47 The number attribute is the actual, unrounded numeric value, and the decimals attribute indicates the number of decimal positions displayed in the table. Since columns are nested within rows, the category element that identifies each column is repeated for each row. For example, since the statistics are displayed in the columns, the element appears three times in the XML: once for the male row, once for the female row, and once for the total row.
PAGE 711

687 Output Management System E Click Paste Commands and/or Paste Subtypes. The list of available subtypes is based on the currently selected command(s). If multiple commands are selected, the list of available subtypes is the union of all subtypes available for any of the selected commands. If no commands are selected, all subtypes are listed. The identifiers are pasted into the designated command syntax window at the current cursor location.
PAGE 712

688 Chapter 47 Copying OMS Labels Instead of identifiers, you can copy labels for use with the LABELS keyword. Labels can be used to differentiate between multiple graphs or multiple tables of the same type in which the outline text reflects some attribute of the particular output object such as the variable names or labels. There are, however, a number of factors that can affect the label text: If split file processing is on, split file group identification may be appended to the label.
PAGE 713

Appendix Database Access Administrator A The Database Access Administrator is a utility designed to simplify large or confusing data sources for use with the Database Wizard. It allows users and administrators to customize their data source in the following ways: Create aliases for database tables and fields. Create variable names for fields. Hide extraneous tables and fields. The Database Access Administrator does not actually change your database.
PAGE 714

690 Appendix A file holds information about all of your data sources for that level. For example, your marketing department will have one file, dba02.inf, that contains the aliasing information for all of the database views established for the marketing department. Each person in the marketing department will have a file, dba03.inf, that contains customized views of all of the databases that he or she uses.
PAGE 715

Appendix Customizing HTML Documents B You can automatically add customized HTML code to documents exported in HTML format, including: HTML document titles Document type specification Meta tags and script code (for example, JavaScript) Text displayed before and after exported output To Add Customized HTML Code to Exported Output Documents E Open the file htmlfram.txt (located in the directory in which SPSS is installed) in a text editor.
PAGE 716

692 Appendix B Content and Format of the Text File for Customized HTML The HTML code that you want to add automatically to your HTML documents must be specified in a simple text file that contains six fields, each delimited by two open angle brackets on the preceding line (<<): << Text or code that you want to insert at the top of the document before the specification (for example, comments that include document type specifications) << Text used as the document title (displayed in the title bar) <<
PAGE 717

693 Customizing HTML Documents E In the left pane of the Registry Editor, choose: HHKEY_CURRENT_USER Software SPSS SPSS for Windows 13.0 SPSSWIN E In the right pane, double-click the string HTMLFormatFile. E For Value data, enter the full path and name of the text file containing the custom HTML specifications (for example, c:\myfiles\htmlstuf.txt). Sample Text File for Customized HTML << << NVI, Inc.
PAGE 718

694 Appendix B NVI Sales, Inc.
NVI Sales

Regional Data
[Exported output]
This page made possible by...

PAGE 719

Index Access (Microsoft), 24 active file, 59, 61, 61 caching, 61 creating a temporary active file, 61 virtual active file, 59 active window, 6 ActiveX objects, 231 adding group labels, 266 adjusted R-square in Linear Regression, 420 aggregating data, 184 aggregate functions, 187 variable names and labels, 188 alignment , 86, 225, 289, 582 in cells, 289 in Data Editor, 86 output, 225, 582 alpha coefficient in Reliability Analysis, 537, 539 alpha factoring, 448 analysis of variance in Curve Estimation, 425 i
PAGE 720

696 Index BMP files , 234, 240, 241, 610 exporting charts, 234, 240, 241, 610 Bonferroni in GLM, 386 in One-Way ANOVA, 371 bookmarking pivot table views, 272 bookmarks, 272 borders , 255, 282, 284 displaying hidden borders, 284 Draft Viewer, 255 Box’s M test in Discriminant Analysis, 435 boxplots comparing factor levels, 324 comparing variables, 324 in Explore, 324 break points, 650 in scripts, 650 break variables in Aggregate Data, 184 Brown-Forsythe statistic in One-Way ANOVA, 374 build terms, 382 button
PAGE 721

697 Index missing values, 485 options, 485 statistics, 485 chi-square tests Fisher’s exact test, 330 for independence, 330 in Crosstabs, 330 likelihood-ratio, 330 linear-by-linear association, 330 one-sample test, 482 Pearson, 330 Yates’ correction for continuity, 330 classification in ROC Curve, 569 cluster analysis efficiency, 477 Hierarchical Cluster Analysis, 465 K-Means Cluster Analysis, 473 cluster frequencies in TwoStep Cluster Analysis, 463 clustering choosing a procedure, 453 Cochran’s Q in Tests
PAGE 722

698 Index in One-Way ANOVA, 374 in Paired-Samples T Test, 363 in ROC Curve, 572 saving in Linear Regression, 417 context-sensitive help, 268 finding label definitions in pivot tables, 268 contingency coefficient in Crosstabs, 330 contingency tables, 327 continuation text, 284 for pivot tables, 284 contrasts in GLM, 383, 384 in One-Way ANOVA, 370 control variables in Crosstabs, 329 convergence in Factor Analysis, 448, 450 in K-Means Cluster Analysis, 478 Cook’s distance in GLM, 389 in Linear Regression, 417
PAGE 723

699 Index parameter queries, 31, 34 prompt for value, 34 random sampling, 31 reading, 23, 23, 24, 24, 26 relationship properties, 29 saving queries, 38 selecting a data source, 24 selecting data fields, 26 specifying criteria, 31 SQL syntax, 38 table joins, 28, 29 verifying results, 38 Where clause, 31 data dictionary applying from another file, 107 Data Editor, 75, 77, 86, 86, 88, 89, 89, 90, 90, 91, 91, 92, 93, 93, 94, 94, 94, 95, 95, 96, 599 alignment, 86 changing data type, 94 column width, 86 data val
PAGE 724

700 Index defining variables, 77, 81, 83, 83, 85, 86, 86, 87, 87, 100 applying a data dictionary, 107 copying and pasting attributes, 86, 87 data types, 81 missing values, 85 templates, 86, 87 value labels, 83, 100 variable labels, 83 deleted residuals in GLM, 389 in Linear Regression, 417 deleting multiple EXECUTES in syntax files, 304 deleting output, 224 dendrograms in Hierarchical Cluster Analysis, 471 dependent t test in Paired-Samples T Test, 361 Descriptives, 315 display order, 317 saving z scores,
PAGE 725

701 Index distance measures in Distances, 407 in Hierarchical Cluster Analysis, 469 Distances, 405 computing distances between cases, 405 computing distances between variables, 405 dissimilarity measures, 407 example, 405 similarity measures, 408 statistics, 405 transforming measures, 407, 408 transforming values, 407, 408 distributed mode, 63, 63, 64, 65, 68, 69, 70, 72, 73, 614 available procedures, 72 data file access, 68, 70 Production Facility, 614 saving data files, 69 UNC paths, 73 division dividing
PAGE 726

702 Index reading variable names, 21 saving, 55, 55 Excel format exporting output, 234, 237 excluding output from Viewer with OMS, 671 EXECUTE command pasted from dialog boxes, 304 expected count in Crosstabs, 334 Explore, 319 missing values, 325 options, 325 plots, 324 power transformations, 325 statistics, 323 exponential model in Curve Estimation, 429 exporting charts, 234, 240, 241, 241, 241, 242, 242, 243, 243, 244, 607, 610 automated production, 607 chart size, 240 exporting data , 599 adding menu it
PAGE 727

703 Index freefield format, 40 Frequencies, 307 charts, 312 display order, 312 formats, 312 statistics, 310 suppressing tables, 312 frequency tables in Explore, 323 in Frequencies, 307 Friedman test in Tests for Several Related Samples, 509 full factorial models in GLM, 381 function procedures, 643 functions, 132 missing value treatment, 133 Gabriel test in GLM, 386 in One-Way ANOVA, 371 Games-Howell test in GLM, 386 in One-Way ANOVA, 371 gamma in Crosstabs, 330 generalized least squares in Factor Analysi
PAGE 728

704 Index Helmert contrasts in GLM, 383, 384 Help button, 8 Help windows, 13 hiding , 223, 223, 224, 273, 274, 274, 275, 600 captions, 275 dimension labels, 274 footnotes, 274 procedure results, 224 results, 223, 223 rows and columns, 273 titles, 275 toolbars, 600 hiding (excluding) output from the Viewer with OMS, 671 Hierarchical Cluster Analysis, 465 agglomeration schedules, 470 clustering cases, 465 clustering methods, 469 clustering variables, 465 cluster membership, 470, 471 dendrograms, 471 distance
PAGE 729

705 Index journal file, 580 JPEG files , 234, 240, 241, 610 exporting charts, 234, 240, 241, 610 justification , 225, 582 output, 225, 582 kappa in Crosstabs, 330 Kendall’s tau-b in Bivariate Correlations, 395 in Crosstabs, 330 Kendall’s tau-c, 330 in Crosstabs, 330 Kendall’s W in Tests for Several Related Samples, 509 K-Means Cluster Analysis, 473 cluster distances, 478 cluster membership, 478 convergence criteria, 478 efficiency, 477 examples, 473 iterations, 478 methods, 473 missing values, 479 overvie
PAGE 730

706 Index linear-by-linear association in Crosstabs, 330 linear model in Curve Estimation, 429 Linear Regression, 409 blocks, 409 exporting model information, 417 missing values, 422 plots, 416 residuals, 417 saving new variables, 417 selection variable, 415 statistics, 420 variable selection methods, 414, 422 weights, 409 line breaks variable and value labels, 84 listing cases, 337 loaded tab, 652 script window, 652 loading plots in Factor Analysis, 450 logarithmic model in Curve Estimation, 429 logging i
PAGE 731

707 Index in Frequencies, 310 in Ratio Statistics, 553 measures of distribution in Descriptives, 317 in Frequencies, 310 median statistic in Explore, 323 in Frequencies, 310 in Means, 346 in OLAP Cubes, 352 in Ratio Statistics, 553 in Summarize, 340 median test in Two-Independent-Samples Tests, 503 memory, 580 memory allocation in TwoStep Cluster Analysis, 459 menus, 6, 599 customizing, 599 merging data files dictionary information, 181 files with different cases, 177 files with different variables, 181 re
PAGE 732

708 Index multiple regression in Linear Regression, 409 Multiple Response command additional features, 519 multiple response analysis crosstabulation, 516 frequency tables, 513 Multiple Response Crosstabs, 516 Multiple Response Frequencies, 513 Multiple Response Crosstabs, 516 cell percentages, 518 defining value ranges, 518 matching variables across response sets, 518 missing values, 518 percentages based on cases, 518 percentages based on responses, 518 Multiple Response Frequencies, 513 missing values,
PAGE 733

709 Index One-Sample Kolmogorov-Smirnov Test, 492 command additional features, 494 missing values, 494 options, 494 statistics, 494 test distribution, 492 One-Sample T Test, 364 confidence intervals, 366 missing values, 366 options, 366 One-Way ANOVA, 367 contrasts, 370 factor variables, 367 missing values, 374 multiple comparisons, 371 options, 374 polynomial contrasts, 370 post hoc tests, 371 statistics, 374 online Help , 13 Statistics Coach, 12 opening files , 19, 19, 19, 19, 19, 19, 19, 20, 21, 22, 22,
PAGE 734

710 Index page setup, 247, 249, 250 chart size, 250 headers and footers, 249 Paired-Samples T Test, 361 missing values, 363 options, 363 selecting paired variables, 361 pane splitter Data Editor, 95 parallel model in Reliability Analysis, 537, 539 parameter estimates in GLM Univariate, 391 Partial Correlations, 401 in Linear Regression, 420 missing values, 404 options, 404 statistics, 404 zero-order correlations, 404 partial plots in Linear Regression, 416 password protection, 252 Paste button, 8 pasting ,
PAGE 735

711 Index layers, 268 manipulating, 263 moving rows and columns, 265 pasting as metafiles, 232 pasting as tables, 229, 232 pasting as text, 232 pasting into other applications, 229 pivoting, 263, 264 printing large tables, 295 printing layers, 245 properties, 278 resetting defaults, 267 rotating labels, 267 scaling to fit page, 279, 284 selecting rows and columns, 293 showing and hiding cells, 273 transposing rows and columns, 265 ungrouping rows or columns, 266 using icons, 264 PNG files , 234, 242 export
PAGE 736

712 Index properties, 278, 279, 641 OLE automation objects, 641 pivot tables, 278 table, 279 proportion estimates in Rank Cases, 145 Proximities in Hierarchical Cluster Analysis, 465 publishing output , 621 with Production Facility, 619 quadratic model in Curve Estimation, 429 quartiles in Frequencies, 310 quartimax rotation in Factor Analysis, 450 random number seed, 133 random sample , 31 database files, 31 random number seed, 133 selecting, 193 range statistic in Descriptives, 317 in Frequencies, 310
PAGE 737

713 Index repeated contrasts in GLM, 383, 384 replacing missing values linear interpolation, 173 linear trend, 173 mean of nearby points, 173 median of nearby points, 173 series mean, 173 reports column summary reports, 529 comparing columns, 532 composite totals, 532 dividing column values, 532 multiplying column values, 532 row summary reports, 521 total columns, 532 Report Summaries in Columns, 529 column format, 524 command additional features, 534 grand total, 534 missing values, 534 page control, 533
PAGE 738

714 Index R statistic in Linear Regression, 420 in Means, 346 running median function, 170 Runs Test, 489 command additional features, 491 cut point, 490 cut points, 489 missing values, 491 options, 491 statistics, 491 Ryan-Einot-Gabriel-Welsch multiple F in GLM, 386 in One-Way ANOVA, 371 Ryan-Einot-Gabriel-Welsch multiple range in GLM, 386 in One-Way ANOVA, 371 sampling random sample, 193 SAS files opening, 19 saving, 55 Savage scores, 145 SAV file format routing output to an SPSS data file, 666, 672 sav
PAGE 739

715 Index autoscript file, 597, 635 autoscripts, 625, 632, 635 creating, 627, 632 debugging, 650, 652 declaring variables, 637, 638 dialog boxes, 646, 648 global procedures file, 597, 644 overview, 623 running, 623 running with toolbar buttons, 603 script window, 628, 630 starter scripts, 631 using automation objects, 636, 638, 639, 642 with command syntax, 653, 653, 655 script window, 628, 630, 642 Debug menu, 650 immediate tab, 652 loaded tab, 652 object browser, 642 properties, 630 stack tab, 652 watch
PAGE 740

716 Index split-file analysis, 189 split-half reliability in Reliability Analysis, 537, 539 splitting tables, 295 controlling table breaks, 295 spreadsheet files, 19, 21, 22, 58 opening, 22 reading ranges, 21 reading variable names, 21 writing variable names, 58 spread-versus-level plots in Explore, 324 in GLM Univariate, 391 SPSS basic steps, 11 SPSS data file format routing output to a data file, 666, 672 SPSSTMPDIR environment variable, 580 squared Euclidean distance in Distances, 407 S-stress in Multid
PAGE 741

717 Index subsets of cases random sample, 193 selecting, 190, 192, 194 subtitles in charts, 563 subtotals in column summary reports, 533 subtypes, 663 vs.
PAGE 742

718 Index exporting output as text, 234, 239, 610 in cells, 293 TIFF files , 242 exporting charts, 234, 240, 242, 610 time series analysis forecast, 430 predicting cases, 430 time series data creating new time series variables, 169 data transformations, 166 defining date variables, 167 replacing missing values, 171 transformation functions, 170 titles , 228 adding to Viewer, 228 in charts, 563 in OLAP Cubes, 356 tolerance in Linear Regression, 420 toolbars , 600, 602, 603, 603, 604 creating, 600, 603 creat
PAGE 743

719 Index V in Crosstabs, 330 value labels, 83, 90, 95, 100, 585 applying to multiple variables, 106 copying, 106 in Data Editor, 95 in merged data files, 181 in outline pane, 585 in pivot tables, 585 inserting line breaks, 84 using for data entry, 90 values, 288 pivot table display format, 288 Van der Waerden estimates, 145 variable attributes , 86, 87 copying and pasting, 86, 87 variable declarations, 637, 638 in scripts, 637, 638 naming conventions, 638 variable importance plots in TwoStep Cluster Analy
PAGE 744

720 Index saving document, 251 space between output items, 250 Wald-Wolfowitz runs in Two-Independent-Samples Tests, 497 Waller-Duncan t test in GLM, 386 in One-Way ANOVA, 371 watch tab, 652 script window, 652 Web , 621 publishing output to, 621 weighted data , 218 and restructured data files, 218 weighted least squares in Linear Regression, 409 weighted mean in Ratio Statistics, 553 weighted predicted values in GLM, 389 weighting cases, 195 fractional weights in Crosstabs, 195 Welch statistic in One-Way