|
|
|
||||
|
|
|||||
| libname | Set library |
| keep | Keeps named variables |
| drop | Drops named variables |
| set | Reads in named file(s). If more than one is named, files are combined (append) |
| proc sort | Sorts cases in a dataset |
| merge | Merges files |
Creating a library allows us to refer to a file in a specific directory (folder) without typing out the full file path. The command libname creates a shortcut that refers back to a specified directory. The two proc print commands below that show that you get the same results by either referring to the file name using the library name or the file path.
libname mylib "c:\sas_data\"; proc print data=mylib.hs1 (obs=10); var write read science; run;
proc print data="c:\sas_data\hs1" (obs=10); var write read science; run;
Suppose we wish to analyze just a subset of the hs1 data file. In fact, we are studying "good readers" and just want to focus on the students who had a reading score of 60 and higher. The following shows how we can take the hs1 dataset to create and store a copy of our data which just has the students with reading scores of 60 or higher.
data mylib.goodread; set mylib.hs1; where (read >=60); run; proc means data=mylib.goodread; var read; run;
Further suppose that our data file had many variables, say 2000 variables, but we only care about just a handful of them, id, female, read and write. We can subset our data file to keep just those variables as shown below.
data mylib.hskept; set mylib.goodread; keep id female read write; run; proc contents data=mylib.hskept; run;
Instead of wanting to keep just a handful of variables, it is possible that we want to get rid of just a handful of variables in our data file. Below we how to remove the variables ses and prog from the dataset.
data mylib.hsdropped; set mylib.goodread; drop ses prog; run; proc contents data=mylib.hsdropped; run;
In this example we start with two datasets, one for males (called hsmale) and one for the females (called hsfemale). We need to combine these files together to be able to analyze them, as shown below. In this example, we are adding cases, sometimes called "stacking" the data files. We do this by listing both data file names on the set statement in data step.
proc freq data=mylib.hsmale; tables female; run; proc freq data=mylib.hsfemale; tables female; run; data mylib.hsmaster; set mylib.hsmale mylib.hsfemale; run; proc freq data=mylib.hsmaster; tables female; run;
Again, we have been given two files. However, in this case, we have a file that has the demographic information (called hsdem) and a file with the test scores (called hstest), and we wish to merge these files together. To merge files together, each file must first be sorted by the same variable and then saved. Both the sorting and the saving can be done with proc sort. Next, a data step with the merge and by statements is used to combine the datasets.
Before we beging, we should look at the data sets.
proc print data=mylib.hsdem (obs=10); run; proc print data=mylib.hstest (obs=10); run;
Next, we will sort the data sets by the variable that identifies in both datasets, in this case, the variable id.
proc sort data=mylib.hsdem out=dem; by id; run; proc sort data=mylib.hstest out=test; by id; run;
Now we can merge the files and look at the resulting data set.
data mylib.all; merge dem test; by id; run; proc contents data="d:\sas_data\all"; run;
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services