UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS FAQ
How do I check that the same data input by two people are consistently entered?

When two people enter the same data (double data entry), a concern is whether discrepancies exist between the two datasets (the rationale of double data entry), and if so, where. We start by reading in the two datasets, one entered by person1 and the second by person2.

data person1;
 input id name $ age ht wt income;
 datalines;
11 john    23 68 145 23000
12 charlie 25 72 178 45000
13 sally   21 64 135 12000
4  mike    34 70 156  5600
43 paul    30 73 189 15600
;
run;

data person2;
 input id name $ age ht wt income;
 datalines;
11 john    23.5 68 145 23000
12 charles   25 52 178 45000
13 sally     21 64  .  12000
4  michael   34 70 156  5600
43 Paul      30 73 189  5600
;
run;

We start by sorting the two datasets by the id variable, id, and then use the compare procedure to see if any discrepancies exist between the two datasets.

proc sort data = person1;
 by id;
run;

proc sort data = person2;
 by id;
run;

proc compare base = person1 compare = person2 novalues;
run;

The COMPARE Procedure
Comparison of WORK.PERSON1 with WORK.PERSON2
(Method=EXACT)

Data Set Summary
Dataset                Created          Modified  NVar    NObs
WORK.PERSON1  18JAN06:09:01:28  18JAN06:09:01:28     6       5
WORK.PERSON2  18JAN06:09:01:28  18JAN06:09:01:28     6       5

Variables Summary
Number of Variables in Common: 6.

Observation Summary
Observation      Base  Compare
First Obs           1        1
First Unequal       1        1
Last  Unequal       5        5
Last  Obs           5        5

Number of Observations in Common: 5.
Total Number of Observations Read from WORK.PERSON1: 5.
Total Number of Observations Read from WORK.PERSON2: 5.

Number of Observations with Some Compared Variables Unequal: 5.
Number of Observations with All Compared Variables Equal: 0.

Values Comparison Summary
Number of Variables Compared with All Observations Equal: 1.
Number of Variables Compared with Some Observations Unequal: 5.
Number of Variables with Missing Value Differences: 1.
Total Number of Values which Compare Unequal: 7.
Maximum Difference: 10000.


Variables with Unequal Values
Variable  Type  Len  Ndif   MaxDif  MissDif
name      CHAR    8     3                 0
age       NUM     8     1    0.500        0
ht        NUM     8     1   20.000        0
wt        NUM     8     1        0        1
income    NUM     8     1    10000        0

The basic compare procedure revealed that differences do exist. We now want to find the discrepancies by id. We use the by statement to give the discrepancies by observations; if we didn't have that statement, discrepancies would have been given by the variables. This statement makes it convenient to correct the errors on a case-by-case basis.

proc compare base = person1 compare = person2 brief;
 by id;
 id id;
run;

The COMPARE Procedure
Comparison of WORK.PERSON1 with WORK.PERSON2
(Method=EXACT)

id=4
NOTE: Values of the following 1 variables compare unequal: name
Value Comparison Results for Variables
_________________________________________________________
          ||  Base Value           Compare Value
      id  ||  name                  name
 _______  ||  ________              ________
          ||
       4  ||  mike                  michael
_________________________________________________________


id=11
NOTE: Values of the following 1 variables compare unequal: age
Value Comparison Results for Variables
_________________________________________________________
          ||       Base    Compare
      id  ||        age        age      Diff.     % Diff
 _______  ||  _________  _________  _________  _________
          ||
      11  ||    23.0000    23.5000     0.5000     2.1739
_________________________________________________________


id=12
NOTE: Values of the following 2 variables compare unequal: name ht
Value Comparison Results for Variables
_________________________________________________________
          ||  Base Value           Compare Value
      id  ||  name                  name
 _______  ||  ________              ________
          ||
      12  ||  charlie               charles
_________________________________________________________
_________________________________________________________
          ||       Base    Compare
      id  ||         ht         ht      Diff.     % Diff
 _______  ||  _________  _________  _________  _________
          ||
      12  ||    72.0000    52.0000   -20.0000   -27.7778
_________________________________________________________


id=13
NOTE: Values of the following 1 variables compare unequal: wt
Value Comparison Results for Variables
_________________________________________________________
          ||       Base    Compare
      id  ||         wt         wt      Diff.     % Diff
 _______  ||  _________  _________  _________  _________
          ||
      13  ||   135.0000          .          .          .
_________________________________________________________


id=43
NOTE: Values of the following 2 variables compare unequal: name income
Value Comparison Results for Variables
_________________________________________________________
          ||  Base Value           Compare Value
      id  ||  name                  name
 _______  ||  ________              ________
          ||
      43  ||  paul                  Paul
_________________________________________________________
________________________________________________________
          ||       Base    Compare
      id  ||     income     income      Diff.     % Diff
 _______  ||  _________  _________  _________  _________
          ||
      43  ||      15600       5600     -10000   -64.1026
_________________________________________________________

We note that from the last case, id = 43, the procedure is case sensitive for character variables.


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.