|
|
|
||||
|
|
|||||
Suppose you want to make a new id variable called newid that is unique for all observations but conceals the identify of who the observation is. The strategy for this can be done like this.DATA orig; INPUT id age; CARDS; 1 3 2 32 3 13 4 16 5 4 6 9 7 43 8 29 9 43 10 47 11 13 12 6 13 43 14 48 15 34 16 13 17 47 18 6 19 34 20 42 21 47 22 49 23 28 24 25 25 39 ; RUN;
1. Here we make newid which is the new random ID and we make ranord which will be used for scrambling the data file.
data NEWIDS;
do NOBS = 1 to 40 ; /* we make up 40 observations in case of duplicates */
newid = " " ; /* newid will be 5 characters wide */
do i = 1 to 5; /* create each digit of newid, 1 - 5 */
* make random number 0-35, 0-9, a-z ;
rannum = int(uniform(0)*36) ;
* if it is 0-9, convert it into 0-9, which is byte(48) - byte(57) ;
if (0 <= rannum <= 9) then ranch = byte(rannum + 48) ;
* if it is 10-36, convert it into a-z, which is byte(65)-byte(90) ;
if (10 <= rannum <= 36) then ranch = byte(rannum + 55);
* combine each digit of "newid" ;
substr(newid,i,1) = ranch ;
end;
* make ranord ;
ranord = uniform(0) ;
output ;
end;
* just keep "newid" and "ranord" ;
keep newid ranord ;
run;
2. Get rid of any duplicates
in newids.3. Scramble the order of newids so the order of the variables does not give any the identify of the observations.PROC SORT DATA=newids NODUPLICATES; BY newid ; RUN;
4. Now, merge orig with newids. If id is missing, that means we have matched all orig observations with newids and it is a newids without an orig, so we should delete the observation. For orig2 drop id and ranord so the identity is now anonymous.PROC SORT DATA=newids ; BY ranord ; RUN;
Show new version of original data file with newid.DATA orig2(DROP=id ranord) crossref(KEEP=id newid); MERGE orig newids ; IF (id = .) THEN DELETE ; run;
Show cross reference file, with id and newid.PROC PRINT DATA=orig2(obs=10); RUN;OBS AGE NEWID 1 3 QMB02 2 32 1QXCR 3 13 VO5FC 4 16 4C63M 5 4 2QQR8 6 9 VT4O5 7 43 W9IFN 8 29 BHPJW 9 43 B0LJQ 10 47 QN0CC
PROC PRINT DATA=crossref(obs=10); RUN;OBS ID NEWID 1 1 QMB02 2 2 1QXCR 3 3 VO5FC 4 4 4C63M 5 5 2QQR8 6 6 VT4O5 7 7 W9IFN 8 8 BHPJW 9 9 B0LJQ 10 10 QN0CC
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services