|
|
|
||||
|
|
|||||
| proc contents | Contents of a SAS dataset |
| proc print | Displays the data |
| proc means | Descriptive statistics |
| proc univariate | More descriptive statistics |
| proc sort | Sort a dataset |
| proc freq | Frequency tables, frequency charts, and crosstabs |
| ods | Output delivery system, allows access to additional output |
| proc corr | Correlation matrix and scatterplots |
| proc sgplot | Used here to produce scatterplots |
Before we start our statistical exploration we will look at the data using proc contents and proc print. Note that the variable prog is a string variable.
options nocenter; proc contents position data='c:\sas_data\hs0'; run; proc print data='c:\sas_data\hs0' (obs=20); run; * If we only want to print some variables, we can use the var statement; proc print data='c:\sas_data\hs0' (obs=20); var gender id race ses schtyp prgtype read; run;
Before we go any further, let's use a data step to make a copy of c:\sas_data\hs0; we will call that copy hs0. Now we can make changes to the temporary data set hs0, without making changes to the permanent data set c:\sas_data\hs0. If we decide that we want to do so later on, we can save hs0 as a permanent data set.
data hs0; set 'c:\sas_data\hs0'; run;
One of the basic descriptive statistics command in SAS is proc means. Below we get means for all of the variables. Along with proc means, we also show the proc univariate output, which displays additional descriptive statistics.
proc means data=hs0; run; proc univariate data=hs0; var read write; run;
With the var statement, we can specify which variables we want to analyze. Also, the n mean median std var options allow us to indicate which statistics we want computed.
proc means data=hs0 n mean median std var; var read math science write; run;
We use the where statement below to look at just those students with a reading score of 60 or higher.
proc means data=hs0 n mean median std var; where read>=60; var read math science write; run;
With the class statement, we get the descriptive statistics broken down by prgtype.
proc means data=hs0 n mean median std var; class prgtype; var read math science write; run;
We can use proc univariate to get detailed descriptive statistics for write along with a histogram with a normal overlay.
proc univariate data=hs0; var write; histogram / normal; run;
We can use proc sgplot to get side-by-side boxplots for the variable write broken down by the levels of prgtype; however, this requires that we first sort the data using proc sort.
proc sort data=hs0; by prgtype; run; proc sgplot data=hs0; vbox write / category=prgtype; run;
Below we use proc freq to get a frequency table for ses. The second example uses proc freq to produce a bar chart and cumulative frequency graph in addition to the frequency table for ses.
proc freq data=hs0; table ses; run; ods graphics on; proc freq data=hs0; table ses / plots=freqplot; run; ods graphics off;
We can also use proc sgplot to generate a bar chart for categorical variables, as well as a histogram for a continuous variable.
proc sgplot data=hs0; vbar ses; run; proc sgplot data=hs0; histogram read; run;
We use proc freq to get frequencies for write, and this illustrates why it can sometimes be undesirable to do frequencies for continuous variables.
proc freq data=hs0; table write; run;
Here we use proc freq to get frequencies for gender, schtyp and prgtype, each table shown separately.
proc freq data=hs0; table gender schtyp prgtype; run;
Below we show how to get a crosstab of prgtype by ses.
proc freq data=hs0; table prgtype*ses; run;
proc corr is used to get correlations among variables. By default, proc corr uses pairwise deletion for missing observations. If you use the nomiss option, proc corr uses listwise deletion and omits all observations with missing data on any of the named variables.
proc corr data=hs0; var write read science; run; proc corr data=hs0 nomiss; var write read science; run;
In the first example below we use proc corr to generate a scatterplot matrix. In the second example below, we use proc corr to get a scatterplot with a confidence ellipse showing the relationship between the first two variables on the var statement (write and read) using the nvar=2 option.
ods graphics on; proc corr data=hs0 nomiss plots=matrix; var write read science; run; ods graphics off; ods graphics on; proc corr data=hs0 nomiss plots=scatter(nvar=2); var write read science; run; ods graphics off;
proc sgplot can also be used to generate a scatterplot.
proc sgplot data=hs0; scatter x=write y=read; run;
We can also modify the symbol with the markerchar option to use the id variable instead of dots. This is especially useful to identify outliers or other interesting observations.
proc sgplot data=hs0; scatter x=write y=read / markerchar=id; run;
We can also create a scatter plot where we have different symbols depending on the gender of the subjects (using the group option). This can be used to check if the relationship between write and math is linear for each gender group.
proc sgplot data=hs0; scatter x=write y=read / group=gender; run;
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services