| libname | Set library |
| if and where | Conditional statement |
| keep | Keeps named variables |
| drop | Drops named variables |
| set | Reads in named file(s), append |
| proc sort | Sorts cases in a dataset |
| merge | Merges files |
libname in "c:\sas_data\"; proc print data=in.hs1 (obs=10); var write read science; run;proc print data="c:\sas_data\hs1" (obs=10); var write read science; run;
Suppose we wish to analyze just a subset of the hs1 data file. In fact, we are studying "good readers" and just want to focus on the students who had a reading score of 60 and higher. Here we show how to create subsets based on the criterion of high vs. low reading scores.
data highread lowread; set in.hs1; if read >=60 then output highread; if read < 60 then output lowread; run; title "high reading scores"; proc means data=highread n mean; var read; run; title "low reading scores"; proc means data=lowread n mean; var read; run;* using where statement; data highread; set in.hs0; where read >=60; run;
Further suppose that our data file had many variables, say 2000 variables, but we only care about just a handful of them, id, female, read and write. We can subset our data file to keep just those variables as shown below.
data hskept; set highread; keep id female read write; run;
Instead of wanting to keep just a handful of variables, it is possible that we want to get rid of just a handful of variables in our data file. Below we how to remove the variables ses and prog from the dataset.
data hsdropped; set highread; drop ses prog; run;
In this example we start with two datasets, one for males (called hsmale) and one for the females (called hsfemale). We need to combine these files together to be able to analyze them by gender, as shown below. In this example, we are adding cases, sometimes called "stacking" the data files. We do this by listing both data file names on the set statement in data step.
title; proc freq data=in.hsmale; tables female; run; proc freq data=in.hsfemale; tables female; run; data in.hsmaster; set in.hsmale in.hsfemale; run; proc ttest data=in.hsmaster; by female; var write; run;
Again, we have been given two files. However, in this case, we have a file that has the demographic information (called hsdem) and a file with the test scores (called hstest), and we wish to merge these files together. To merge files together, each file must first be sorted by the same variable and then saved. Both the sorting and the saving can be done with proc sort. Next, a data step with the merge and by statements is used to combine the datasets.
Before we begin, we should look at the data sets.
proc print data=in.hsdem ; run; proc print data=in.hstest ; run;
Next, we will sort the data sets by the variable that identifies in both datasets, in this case, the variable id.
proc sort data=in.hsdem out=dem; by id; run; proc sort data=in.hstest out=test; by id; run;
Now we can merge the files and look at the resulting data set.
data all; merge dem test; by id; run; proc print data=all; run;
We can see that the two data sets don't match completely. Let's do a better job this time to have more information on the status of merge. We are going to create a variable called merge in the merged data set to indicate where the observation is from. To this end, we make use of a temporary SAS variable "in".
data all; merge dem (in=fromdem) test (in=fromtest); by id; if fromdem = 1 & fromtest = 1 then merge = 3; if fromdem = 1 & fromtest = 0 then merge = 1; if fromdem = 0 & fromtest = 1 then merge = 2; run; proc freq data = all; tables merge; run; proc print data = all; run;
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.