### SAS Class Notes Exploring Data

#### 1.0 SAS statements and procs in this unit

 proc contents Contents of a SAS dataset proc print Displays the data proc means Descriptive statistics proc univariate More descriptive statistics proc freq Frequency tables, frequency charts, and crosstabs ods Output delivery system, creating output in various formats proc corr Correlation matrix and scatterplots proc sgplot Produces many types of plots

#### 2.0 Demonstration and explanation

We will begin by submitting options nocenter so that the output is left justified. We use the libname statement to refer to a folder of SAS data files. We will continue to use the SAS dataset hs0 that was created in the previous unit.

Before we start our statistical exploration we will look at the data using proc contents and proc print.

options nocenter;
libname in 'c:\sas_data\';
proc contents position data=in.hs0;
run;
proc print data=in.hs0 (obs=20);
run;
* If we only want to print some variables, we can use the var statement;
proc print data=in.hs0 (obs=20);
var gender id race ses schtyp prgtype read;
run; 

Before we go any further, let's use a data step to make a temporary copy of hs0 and we will still call it hs0. Now we can make changes to the temporary data set hs0, without making changes to the permanent data set c:\sas_data\hs0. If we decide that we want to do so later on, we can save hs0 as a permanent data set.

data hs0;
set in.hs0;
run;

One of the basic descriptive statistics command in SAS is proc means. Below we get means for all of the variables. Along with proc means, we also show the proc univariate output, which displays additional descriptive statistics.

proc means data=hs0;
run;

proc univariate data=hs0;
run;
With the var statement, we can specify which variables we want to analyze.  Also, the n mean median std var options allow us to indicate which statistics we want computed.
proc means data=hs0 n mean median std var;
run;

We use the where statement below to look at just those students with a reading score of 60 or higher.

proc means data=hs0 n mean median std var;
run;

With the class statement, we get the descriptive statistics broken down by prgtype and ses.

proc means data=hs0 n mean median std var;
class prgtype ses;
run;

We can use proc univariate to get detailed descriptive statistics for write along with a histogram with a normal overlay.

proc univariate data=hs0 noprint;
var write;
histogram / normal;
run;

We can use proc sgplot to get side-by-side boxplots for the variable write broken down by the levels of prgtype.

proc sgplot data=hs0;
vbox write / category=prgtype;
run;

Below we use proc freq to get a frequency table for ses. The second example uses proc freq to produce a bar chart and cumulative frequency graph in addition to the frequency table for ses.

proc freq data=hs0;
table ses;
run;

ods graphics on;
proc freq data=hs0;
table prgtype*ses / plots=freqplot;
run;
ods graphics off;

Here we use proc freq to get frequencies for gender, schtyp and prgtype, each table shown separately.

proc freq data=hs0;
table gender schtyp prgtype;
run;

Below we show how to get a crosstab of prgtype by ses.

proc freq data=hs0;
table prgtype*ses;
run;

proc corr is used to get correlations among variables.  By default, proc corr uses pairwise deletion for missing observations.  If you use the nomiss option, proc corr uses listwise deletion and omits all observations with missing data on any of the named variables.

proc corr data=hs0;
run;

proc corr data=hs0 nomiss;
run;

In the example below we use proc corr to generate a scatterplot matrix. In the second example below, we use proc sgplot to get a scatterplot with a confidence ellipse showing the relationship between the two variables write and read.

ods graphics on;
proc corr data=hs0 nomiss plots=matrix;
run;
ods graphics off;

proc sgplot data = hs0;
scatter x = read  y = write;
ellipse x = read  y = write;
run;

We can also modify the symbol with the markerchar option to use the id variable instead of dots. This is especially useful to identify outliers or other interesting observations.

proc sgplot data=hs0;
run;

We can also create a scatter plot where we have different symbols depending on the gender of the subjects (using the group option). This can be used to check if the relationship between write and math is linear for each gender group.

proc sgplot data=hs0;
run;

Here are a couple more examples using proc sgplot.

proc sgplot data=hs0;
vbar ses /response = write stat=mean limits=both ;
run;

proc sgplot data=hs0;
run;