UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Class Notes 2.0
Exploring Data


1.0 SAS statements and procs in this unit

proc contents Contents of a SAS dataset
proc print Displays the data
proc means Descriptive statistics
proc univariate More descriptive statistics
proc sort Sort a dataset
proc boxplot Boxplots
proc freq Frequency tables and crosstabs
proc chart ASCII histogram
proc corr Correlation matrix
proc reg OLS regression

2.0 Demonstration and Explanation

options nocenter;
proc contents position data='c:\sas\hs0';
run;
proc print data='c:\sas\hs0' (obs=20);
run;
proc print data='c:\sas\hs0';
  var gender id race ses schtyp prgtype read;
run; 
proc means data='c:\sas\hs0';
run;
With the var statement, we can specify which variables we want to analyze.  Also, the n mean median std var options allow us to indicate which statistics we want computed.
proc means data='c:\sas\hs0' n mean median std var;
  var read math science write;
run;

We use the where statement below to look at just those students with a reading score of 60 or higher.

proc means data='c:\sas\hs0' n mean median std var;
  where read>=60;
  var read math science write;
run;

With the class statement, we get the descriptive statistics broken down by prgtype.

proc means data='c:\sas\hs0' n mean median std var;
  class prgtype;
  var read math science write;
run;

We can use proc univariate to get detailed descriptive statistics for write along with a histogram with a normal overlay.

proc univariate plot data='c:\sas\hs0';
  var write;
  histogram / normal;
run;

We can use proc boxplot to get side-by-side boxplots for the variable write broken down by the levels of prgtype; however, this requires that we first sort the data using proc sort.

proc sort data='c:\sas\hs0';
  by prgtype;
run;

proc boxplot data='c:\sas\hs0';
  plot write*prgtype / boxstyle=schematic boxwidth=10;
run;

Below proc freq is used to get a frequency table for ses and proc chart shows a bar chart of this distribution.

proc freq data='c:\sas\hs0';
  table ses;
run;

proc chart data='c:\sas\hs0'; vbar ses / discrete;
run;

We use proc freq to get frequencies for write and this illustrates why it can sometimes be undesirable to do frequencies for continuous variables.

proc freq data='c:\sas\hs0';
  table write;
run;

Here we use proc freq to get frequencies for gender, schtyp and prgtype, each table shown separately.

proc freq data='c:\sas\hs0';
  table gender schtyp prgtype;
run;

Below we show how to get a crosstab of prgtype by ses, and the next example shows how to include a chi square test and how to get the expected frequencies.

proc freq data='c:\sas\hs0';
  table prgtype*ses;
run;
proc freq data='c:\sas\hs0';
  table prgtype*ses / chisq expected;
run; 

proc corr is used to get correlations among variables.  By default, proc corr uses pairwise deletion for missing observations.  If you use the nomiss option, proc corr uses listwise deletion and omits all observations with missing data on any of the named variables.

proc corr data='c:\sas\hs0'; 
  var write read science;
run;

proc corr data='c:\sas\hs0' nomiss; 
  var write read science;
run;

We conclude with proc reg showing a simple regression predicting write from read along with a scatterplot and regression line.

proc reg data='c:\sas\hs0';
  model write=read;
  plot write*read ;
run;

3.0 For More Information






How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California