User Guide

116
Chapter 6
Identifying
Duplicate Cases
“Duplicate
” cases may occur in your data for many reasons, including:
Data entry e
rrors in which the same case is accidentally entered more than once.
Multiple ca
ses share a common primary ID value but have different secondary ID
values, such as family members who all live in the same house.
Multiple cases represent the same case but with different values for variables
other than those that identify the case, such as multiple purchases made by the
same person
or company for different products or at different times.
Identify Du
plicate Cases allows you to define duplicate almost any way that you
want and provides some control over the automatic determination of primary versus
duplicate cases.
To identify and flag duplicate cases:
E From the menus choose:
Data
Identify D
uplicate Cases...
E
Select on
e or more variables that identify matching cases.
E Select on
e or more of the options in the Variables to Create group.
Optionally, you can:
E Select one or more variables to sort cases within groups defined by the selected
matching c
ases variables. The sort order defined by these variables determines the
“first” and “last” case in each group. Otherwise, the original file order is used.
E Automatically filter duplicate cases so that they won’t be included in reports, charts,
or calculation of statistics.