User Guide

116

Chapter 6

Identifying

Duplicate Cases

“Duplicate

” cases may occur in your data for many reasons, including:

 Data entry e

rrors in which the same case is accidentally entered more than once.

 Multiple ca

ses share a common primary ID value but have different secondary ID

values, such as family members who all live in the same house.

 Multiple cases represent the same case but with different values for variables

other than those that identify the case, such as multiple purchases made by the

same person

or company for different products or at different times.

Identify Du

plicate Cases allows you to define duplicate almost any way that you

want and provides some control over the automatic determination of primary versus

duplicate cases.

To identify and flag duplicate cases:

E From the menus choose:

Data

Identify D

uplicate Cases...

Select on

e or more variables that identify matching cases.

E Select on

e or more of the options in the Variables to Create group.

Optionally, you can:

E Select one or more variables to sort cases within groups defined by the selected

matching c

ases variables. The sort order defined by these variables determines the

“first” and “last” case in each group. Otherwise, the original file order is used.

E Automatically filter duplicate cases so that they won’t be included in reports, charts,

or calculation of statistics.