Statistical Computing Seminars
What's New in SAS 9.2

In this seminar, we are going to introduce a couple of new procedures and some new features in existing procedures in SAS 9.2 for statistical analysis. The selection of what to present here is mainly based on our experience from our consulting service. If you are interested in knowing more about what's new in SAS 9.2, here is the link to the documentation by SAS on what's in SAS 9.2. Here is the link to the zipped SAS program file and data files used for this seminar.

1. Setting up a learning environment within SAS

SAS comes with a great many sample programs for data steps and for all the procedures. SAS 9.2 also has the entire online documentation within SAS. We will first show how to easily get access to the SAS sample programs following the instructions given by our page on Customizing SAS 9.2


2. New procedures for statistical analysis

You probably have used proc glimmix in SAS 9.1.3  for analyzing multilevel data with non-normal, such as count or dichotomous outcome variables. In SAS 9.1.3, proc glimmix is an experimental procedure that requires additional downloading and installation. Now in SAS 9.2 it is a production procedure. Moreover, it offers maximum likelihood estimation wit adaptive quadrature together with Laplace approximation estimation method. Same as most of the other statistical procedures, it also provides ODS graphics, such as diagnostics graphs. It can handle normal, binary, binomial, ordered and count outcome variables.

Here is an example dealing with a binary outcome variable. 

ods graphics on;
proc glimmix data = ats.thaieduc plots =(all) noclprint method=quad;
  class  sex schoolid;
  model repeat (event='1') = sex msesc sex*msesc 
                            / solution dist=binary 
                              oddsratio (at msesc = .5 unit msesc =.1);
  random intercept /subject = schoolid;
run;
ods graphics off;
The GLIMMIX Procedure

                  Model Information

Data Set                      ATS.THAIEDUC
Response Variable             REPEAT
Response Distribution         Binary
Link Function                 Logit
Variance Function             Default
Variance Matrix Blocked By    SCHOOLID
Estimation Technique          Maximum Likelihood
Likelihood Approximation      Gauss-Hermite Quadrature
Degrees of Freedom Method     Containment


Number of Observations Read        8582
Number of Observations Used        7516


           Response Profile

 Ordered                        Total
   Value    REPEAT          Frequency

       1    0                    6449
       2    1                    1067

The GLIMMIX procedure is modeling the probability that REPEAT='1'.

            Dimensions

G-side Cov. Parameters           1
Columns in X                     6
Columns in Z per Subject         1
Subjects (Blocks in V)         356
Max Obs per Subject             41


           Optimization Information

Optimization Technique        Dual Quasi-Newton
Parameters in Optimization    5
Lower Boundaries              1
Upper Boundaries              0
Fixed Effects                 Not Profiled
Starting From                 GLM estimates
Quadrature Points             7

                                Iteration History

                                           Objective                         Max
Iteration    Restarts    Evaluations        Function          Change    Gradient

        0           0              4    5507.6473045       .            130.4493
        1           0              3    5482.1591394     25.48816512    24.41885
        2           0              3     5479.727173      2.43196632    10.25265
        3           0              3    5478.7888209      0.93835210    5.524192
        4           0              2    5478.7248344      0.06398651    0.968477
        5           0              3    5478.7227711      0.00206335    0.397583
        6           0              3    5478.7223653      0.00040580    0.012755
        7           0              3    5478.7223621      0.00000320    0.002078

         Convergence criterion (GCONV=1E-8) satisfied.

           Fit Statistics

-2 Log Likelihood            5478.72
AIC  (smaller is better)     5488.72
AICC (smaller is better)     5488.73
BIC  (smaller is better)     5508.10
CAIC (smaller is better)     5513.10
HQIC (smaller is better)     5496.43

     Fit Statistics for Conditional
              Distribution

-2 log L(REPEAT | r. effects)     4754.08
Pearson Chi-Square                5629.08
Pearson Chi-Square / DF              0.75

       Covariance Parameter Estimates

                                     Standard
Cov Parm     Subject     Estimate       Error

Intercept    SCHOOLID      1.7364      0.2143

                        Solutions for Fixed Effects

             pupil                 Standard
Effect       gender    Estimate       Error       DF    t Value    Pr > |t|

Intercept               -1.9866     0.09301      354     -21.36      <.0001
SEX          0          -0.5474     0.07603     7158      -7.20      <.0001
SEX          1                0           .        .        .         .
MSESC                   -0.3250      0.2328     7158      -1.40      0.1626
MSESC*SEX    0          -0.3045      0.1975     7158      -1.54      0.1232
MSESC*SEX    1                0           .        .        .         .

                              Odds Ratio Estimates

pupil               pupil                                       95% Confidence
gender     MSESC    gender    _MSESC    Estimate       DF           Limits

0            0.5    1            0.5       0.497     7158       0.386       0.640
0            0.6    0            0.5       0.939     7158       0.895       0.986
1            0.6    1            0.5       0.968     7158       0.925       1.013

        Type III Tests of Fixed Effects

              Num      Den
Effect         DF       DF    F Value    Pr > F

SEX             1     7158      51.84    <.0001
MSESC           1     7158       4.75    0.0294
MSESC*SEX       1     7158       2.38    0.1232
Panel of conditional residuals based on pseudo-data. The pseudo-data are constructed from REPEAT. Each panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and a box plot of the residuals.
Panel of conditional studentized residuals based on pseudo-data. The pseudo-data are constructed from REPEAT. Each panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and a box plot of the residuals.
Panel of conditional Pearson residuals based on pseudo-data. The pseudo-data are constructed from REPEAT. Each panel consists of a scatterplot of the residuals, a histogram with normal density, a Q-Q plot, and a box plot of the residuals.

Proc countreg is part of SAS/ETS for econometrics and time series. It supports the following models for count data: Poisson regression, negative binomial regression, zero-inflated Poisson (ZIP) model  and zero-inflated negative binomial (ZINB) model. Proc genmod in SAS/STAT module supports everything but ZINB model. Here is a data analysis example page on zero-inflated Poisson regression model using SAS 9.2


Proc mcmc is for Bayesian models using Markov chain Monte Carlo (MCMC) simulation. It can be used as a simulation tool. Here is an example from SAS documentation for simulating a normal distribution.

data x;
  run;  
ods graphics on;
proc mcmc data=x outpost=simout seed=23 nmc=10000 maxtune=0
          nbi=0 statistics=(summary interval) diagnostics=none;
   parm alpha 0;
   prior alpha ~ normal(0, sd=1);
   model general(0);
run;
ods graphics off;
The MCMC Procedure

                               Posterior Summaries

                                      Standard               Percentiles
Parameter           N        Mean    Deviation         25%         50%         75%

alpha           10000     -0.0392       1.0194     -0.7198     -0.0403      0.6351


                       Posterior Intervals

Parameter    Alpha     Equal-Tail Interval        HPD Interval

alpha        0.050     -2.0746      1.9594     -2.2197      1.7869

Diagnostic Plots for


3. New features in existing procedures




When an interaction term is present, odds ratios are calculated and graphed as shown in the example below.

data hsb2;
  set ats.hsb2;
  hon=(write>60);
run;
ods graphics on;
proc logistic data = hsb2 descending;
   model hon = female math female*math;
   oddsratio female / at(math = 45 50 65);
run;
ods graphics off;
The LOGISTIC Procedure

              Analysis of Maximum Likelihood Estimates

                                 Standard          Wald
Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept       1     -8.7458      2.1291       16.8729        <.0001
female          1     -2.8998      3.0942        0.8783        0.3487
math            1      0.1294      0.0359       12.9994        0.0003
female*math     1      0.0670      0.0535        1.5704        0.2101

         Wald Confidence Interval for Odds Ratios

Label                     Estimate    95% Confidence Limits

female at math=45            1.122       0.245        5.139
female at math=50            1.568       0.517        4.759
female at math=65            4.284       1.386       13.237

Plot of Odds Ratios with 95% Wald Confidence Limits

When there is a quasi-complete separation of data points, the maximum likelihood estimate may not exist. SAS 9.2 provides Firth estimation for dealing with the issue of quasi or complete separation of data points.

data test;
 input Y X freq;
datalines;
0 1	3
0 2	4
0 3	5
0 3	10
1 3	6
1 4	12
1 5	8
1 6	9
1 10 11
1 11 6
;
run;
proc logistic data = test descending;
 freq freq;
  model y = x;
run;
The LOGISTIC Procedure
WARNING: The validity of the model fit is questionable.

        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        64.9376        1         <.0001
Score                   26.0506        1         <.0001
Wald                     0.0859        1         0.7695

             Analysis of Maximum Likelihood Estimates

                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1    -32.8245       108.9        0.0909        0.7630
X             1     10.6361     36.2903        0.0859        0.7695

proc logistic data = test descending;
  freq freq;
  model y = x /firth;
run;
        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        57.0231        1         <.0001
Score                   25.3902        1         <.0001
Wald                     7.6435        1         0.0057

             Analysis of Maximum Likelihood Estimates

                               Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1    -13.0905      4.5755        8.1851        0.0042
X             1      4.0766      1.4745        7.6435        0.0057

ROC curves and ROC curve contrast.

ods graphics on;
proc logistic data=hsb2 plots=roc(id=prob);
   model hon = female math read;
   roc 'female' female;
   roc 'maths score' math;
   roc 'read' read;	
   roccontrast reference('female') / estimate e;
run;
ods graphics off;
                         ROC Association Statistics

              -------------- Mann-Whitney -------------
                         Standard         95% Wald        Somers' D
ROC Model         Area      Error    Confidence Limits       (Gini)      Gamma

Model           0.8569     0.0288     0.8005     0.9134      0.7139     0.7142
female          0.5716     0.0400     0.4932     0.6499      0.1431     0.2880
maths score     0.8325     0.0329     0.7681     0.8970      0.6651     0.6792
read            0.7979     0.0325     0.7343     0.8616      0.5959     0.6298

ROC Association Statistics

ROC Model        Tau-a

Model           0.2654
female          0.0532
maths score     0.2473
read            0.2216


              ROC Contrast Coefficients

ROC Model            Row1          Row2          Row3

Model                   1             0             0
female                 -1            -1            -1
maths score             0             1             0
read                    0             0             1

              ROC Contrast Test Results

Contrast                DF    Chi-Square    Pr > ChiSq

Reference = female       3      113.0593        <.0001


                ROC Contrast Rows Estimation and Testing Results

                                Standard       95% Wald                     Pr >
Contrast              Estimate     Error   Confidence Limits  Chi-Square   ChiSq

Model - female          0.2854    0.0439    0.1994    0.3714     42.3060  <.0001
maths score - female    0.2610    0.0532    0.1567    0.3652     24.0700  <.0001
read - female           0.2264    0.0543    0.1199    0.3329     17.3547  <.0001

ROC Curve for Model

ROC Curve for female

ROC Curve for maths score

ROC Curve for read

ROC Curves for Comparisons


4. New graphics procedures for statistical graphics

proc sgplot data=ats.hsb2;
  dot ses / response=write stat=mean
            limitstat=stddev numstd=1;
run;

The SGPlot Procedure

proc sgplot data=ats.hsb2;
  scatter x=math y=write;
  ellipse x=math y=write;
  keylegend / location=inside position=bottomright;
run;
The SGPlot Procedure
title;
filename odsout 'c:\sas\temp\test.htm';
goptions device = java ;
ods listing close;
ods html file=odsout style=styles.ocean;
proc gchart data=ats.hsb2;
  block prog /sumvar= write type=mean;
run;
ods html close;
ods listing;

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.