SAS Class Notes
Managing Data


1.0 SAS statements and procs in this unit

libname Set library
if and where Conditional statement
keep Keeps named variables
drop Drops named variables
set Reads in named file(s), append
proc sortSorts cases in a dataset
merge Merges files

2.0 Demonstration and explanation

Creating a library

Creating a library allows us to refer to a file in a specific directory (folder) without typing out the full file path. The command libname creates a shortcut that refers back to a specified directory. The two proc print commands below that show that you get the same results by either referring to the file name using the library name or the file path.
libname in "c:\sas_data\";

proc print data=in.hs1 (obs=10);
  var write read science;
run;
proc print data="c:\sas_data\hs1" (obs=10);
  var write read science;
run;

Selecting cases using if or where statement

Suppose we wish to analyze just a subset of the hs1 data file.  In fact, we are studying "good readers" and just want to focus on the students who had a reading score of 60 and higher. Here we show how to create subsets based on the criterion of high vs. low reading scores. 

data highread lowread;
  set in.hs1;
  if read >=60 then output highread;
  if read < 60 then output lowread;
run;

title "high reading scores";
proc means data=highread n mean;
  var read;
run;

title "low reading scores";
proc means data=lowread n mean;
  var read;
run;
* using where statement;
data highread;
  set in.hs0;
  where read >=60;
run;

Keeping variables

Further suppose that our data file had many variables, say 2000 variables, but we only care about just a handful of them, id, female, read and write. We can subset our data file to keep just those variables as shown below.

data hskept;
  set highread;
  keep id female read write;
run;

Dropping variables

Instead of wanting to keep just a handful of variables, it is possible that we want to get rid of just a handful of variables in our data file. Below we how to remove the variables ses and prog from the dataset.

data hsdropped;
  set highread;
  drop ses prog;
run;

Appending datasets

In this example we start with two datasets, one for males (called hsmale) and one for the females (called hsfemale). We need to combine these files together to be able to analyze them by gender, as shown below. In this example, we are adding cases, sometimes called "stacking" the data files. We do this by listing both data file names on the set statement in data step.

title;
proc freq data=in.hsmale;
  tables female;
run;

proc freq data=in.hsfemale;
  tables female;
run;

data in.hsmaster;
  set in.hsmale in.hsfemale;
run;

proc ttest data=in.hsmaster;
  by female;
  var write;
run;

Merging datasets

Again, we have been given two files. However, in this case, we have a file that has the demographic information (called hsdem) and a file with the test scores (called hstest), and we wish to merge these files together. To merge files together, each file must first be sorted by the same variable and then saved. Both the sorting and the saving can be done with proc sort.  Next, a data step with the merge and by statements is used to combine the datasets.

Before we begin, we should look at the data sets.

proc print data=in.hsdem ;
run;

proc print data=in.hstest ;
run;

Next, we will sort the data sets by the variable that identifies in both datasets, in this case, the variable id.

proc sort data=in.hsdem out=dem;
  by id;
run;

proc sort data=in.hstest out=test;
  by id;
run;

Now we can merge the files and look at the resulting data set.

data all;
  merge dem test;
  by id;
run;

proc print data=all;
run;

We can see that the two data sets don't match completely. Let's do a better job this time to have more information on the status of merge. We are going to create a variable called merge in the merged data set to indicate where the observation is from. To this end, we make use of a temporary SAS variable "in".

data all;
  merge dem (in=fromdem) test (in=fromtest);
  by id;
  if fromdem = 1 & fromtest = 1 then merge = 3;
  if fromdem = 1 & fromtest = 0 then merge = 1;
  if fromdem = 0 & fromtest = 1 then merge = 2;
run;
proc freq data = all;
  tables merge;
run;
proc print data = all;
run;

3.0 For more information

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.