### Statistical Computing Seminars What's New in SAS 9.2

In this seminar, we are going to introduce a couple of new procedures and some new features in existing procedures in SAS 9.2 for statistical analysis. The selection of what to present here is mainly based on our experience from our consulting service. If you are interested in knowing more about what's new in SAS 9.2, here is the link to the documentation by SAS on what's in SAS 9.2. Here is the link to the zipped SAS program file and data files used for this seminar.

#### 1. Setting up a learning environment within SAS

SAS comes with a great many sample programs for data steps and for all the procedures. SAS 9.2 also has the entire online documentation within SAS. We will first show how to easily get access to the SAS sample programs following the instructions given by our page on Customizing SAS 9.2

#### 2. New procedures for statistical analysis

• PROC GLIMMIX

You probably have used proc glimmix in SAS 9.1.3  for analyzing multilevel data with non-normal, such as count or dichotomous outcome variables. In SAS 9.1.3, proc glimmix is an experimental procedure that requires additional downloading and installation. Now in SAS 9.2 it is a production procedure. Moreover, it offers maximum likelihood estimation wit adaptive quadrature together with Laplace approximation estimation method. Same as most of the other statistical procedures, it also provides ODS graphics, such as diagnostics graphs. It can handle normal, binary, binomial, ordered and count outcome variables.

Here is an example dealing with a binary outcome variable.

ods graphics on;
proc glimmix data = ats.thaieduc plots =(all) noclprint method=quad;
class  sex schoolid;
model repeat (event='1') = sex msesc sex*msesc
/ solution dist=binary
oddsratio (at msesc = .5 unit msesc =.1);
random intercept /subject = schoolid;
run;
ods graphics off;
The GLIMMIX Procedure

Model Information

Data Set                      ATS.THAIEDUC
Response Variable             REPEAT
Response Distribution         Binary
Link Function                 Logit
Variance Function             Default
Variance Matrix Blocked By    SCHOOLID
Estimation Technique          Maximum Likelihood
Likelihood Approximation      Gauss-Hermite Quadrature
Degrees of Freedom Method     Containment

Number of Observations Read        8582
Number of Observations Used        7516

Response Profile

Ordered                        Total
Value    REPEAT          Frequency

1    0                    6449
2    1                    1067

The GLIMMIX procedure is modeling the probability that REPEAT='1'.

Dimensions

G-side Cov. Parameters           1
Columns in X                     6
Columns in Z per Subject         1
Subjects (Blocks in V)         356
Max Obs per Subject             41

Optimization Information

Optimization Technique        Dual Quasi-Newton
Parameters in Optimization    5
Lower Boundaries              1
Upper Boundaries              0
Fixed Effects                 Not Profiled
Starting From                 GLM estimates
Quadrature Points             7

Iteration History

Objective                         Max
Iteration    Restarts    Evaluations        Function          Change    Gradient

0           0              4    5507.6473045       .            130.4493
1           0              3    5482.1591394     25.48816512    24.41885
2           0              3     5479.727173      2.43196632    10.25265
3           0              3    5478.7888209      0.93835210    5.524192
4           0              2    5478.7248344      0.06398651    0.968477
5           0              3    5478.7227711      0.00206335    0.397583
6           0              3    5478.7223653      0.00040580    0.012755
7           0              3    5478.7223621      0.00000320    0.002078

Convergence criterion (GCONV=1E-8) satisfied.

Fit Statistics

-2 Log Likelihood            5478.72
AIC  (smaller is better)     5488.72
AICC (smaller is better)     5488.73
BIC  (smaller is better)     5508.10
CAIC (smaller is better)     5513.10
HQIC (smaller is better)     5496.43

Fit Statistics for Conditional
Distribution

-2 log L(REPEAT | r. effects)     4754.08
Pearson Chi-Square                5629.08
Pearson Chi-Square / DF              0.75

Covariance Parameter Estimates

Standard
Cov Parm     Subject     Estimate       Error

Intercept    SCHOOLID      1.7364      0.2143

Solutions for Fixed Effects

pupil                 Standard
Effect       gender    Estimate       Error       DF    t Value    Pr > |t|

Intercept               -1.9866     0.09301      354     -21.36      <.0001
SEX          0          -0.5474     0.07603     7158      -7.20      <.0001
SEX          1                0           .        .        .         .
MSESC                   -0.3250      0.2328     7158      -1.40      0.1626
MSESC*SEX    0          -0.3045      0.1975     7158      -1.54      0.1232
MSESC*SEX    1                0           .        .        .         .

Odds Ratio Estimates

pupil               pupil                                       95% Confidence
gender     MSESC    gender    _MSESC    Estimate       DF           Limits

0            0.5    1            0.5       0.497     7158       0.386       0.640
0            0.6    0            0.5       0.939     7158       0.895       0.986
1            0.6    1            0.5       0.968     7158       0.925       1.013

Type III Tests of Fixed Effects

Num      Den
Effect         DF       DF    F Value    Pr > F

SEX             1     7158      51.84    <.0001
MSESC           1     7158       4.75    0.0294
MSESC*SEX       1     7158       2.38    0.1232

• PROC COUNTREG and PROC GENMOD for count models

Proc countreg is part of SAS/ETS for econometrics and time series. It supports the following models for count data: Poisson regression, negative binomial regression, zero-inflated Poisson (ZIP) model  and zero-inflated negative binomial (ZINB) model. Proc genmod in SAS/STAT module supports everything but ZINB model. Here is a data analysis example page on zero-inflated Poisson regression model using SAS 9.2

• PROC MCMC

Proc mcmc is for Bayesian models using Markov chain Monte Carlo (MCMC) simulation. It can be used as a simulation tool. Here is an example from SAS documentation for simulating a normal distribution.

data x;
run;
ods graphics on;
proc mcmc data=x outpost=simout seed=23 nmc=10000 maxtune=0
nbi=0 statistics=(summary interval) diagnostics=none;
parm alpha 0;
prior alpha ~ normal(0, sd=1);
model general(0);
run;
ods graphics off;
The MCMC Procedure

Posterior Summaries

Standard               Percentiles
Parameter           N        Mean    Deviation         25%         50%         75%

alpha           10000     -0.0392       1.0194     -0.7198     -0.0403      0.6351

Posterior Intervals

Parameter    Alpha     Equal-Tail Interval        HPD Interval

alpha        0.050     -2.0746      1.9594     -2.2197      1.7869

#### 3. New features in existing procedures

• PROC FREQ

*testing for specified proportions;
proc freq data=ats.hsb2;
tables ses / testp=(.33 .4 .27);
run;
The FREQ Procedure

Test     Cumulative    Cumulative
ses    Frequency     Percent     Percent     Frequency      Percent
--------------------------------------------------------------------
1          47       23.50       33.00            47        23.50
2          95       47.50       40.00           142        71.00
3          58       29.00       27.00           200       100.00

Chi-Square Test
for Specified Proportions
-------------------------
Chi-Square         8.5785
DF                      2
Pr > ChiSq         0.0137

Sample Size = 200

* distribution plot;
ods graphics on;
proc freq data = ats.hsb2;
tables ses*prog;
run;
ods graphics off;

*binomial proportion test and confidence interval;
proc freq data = ats.hsb2;
tables prog /binomial (level=2 p=.55 all);
run;
                     type of program

Cumulative    Cumulative
prog    Frequency     Percent     Frequency      Percent
---------------------------------------------------------
1          45       22.50            45        22.50
2         105       52.50           150        75.00
3          50       25.00           200       100.00

Binomial Proportion
for prog = 2
----------------------
Proportion      0.5250
ASE             0.0353

Type                     95% Confidence Limits

Wald                          0.4558    0.5942
Wilson                        0.4560    0.5931
Agresti-Coull                 0.4560    0.5931
Jeffreys                      0.4558    0.5934
Clopper-Pearson (Exact)       0.4534    0.5959

Test of H0: Proportion = 0.55

ASE under H0              0.0352
Z                        -0.7107
One-sided Pr <  Z         0.2386
Two-sided Pr > |Z|        0.4773

Sample Size = 200

• PROC REG

* robust standard error, collinearity and test of heteroscedasticity;
ods graphics on;
proc reg data = ats.hsb2 plots=diagnostics;
model write = female math read /collin  spec hccmethod=1 white;
run;
quit;
ods graphics off;
The REG Procedure
Model: MODEL1
Dependent Variable: write writing score

Number of Observations Read         200
Number of Observations Used         200

Analysis of Variance

Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3     9405.34864     3135.11621      72.52    <.0001
Error                   196     8473.52636       43.23228
Corrected Total         199          17879

Root MSE              6.57513    R-Square     0.5261
Dependent Mean       52.77500    Adj R-Sq     0.5188
Coeff Var            12.45879

Parameter Estimates

Parameter      Standard
Variable    Label           DF      Estimate         Error   t Value   Pr > |t|

Intercept   Intercept        1      11.89566       2.86285      4.16     <.0001
female                       1       5.44337       0.93500      5.82     <.0001
math        math score       1       0.39748       0.06640      5.99     <.0001
read        reading score    1       0.32524       0.06073      5.36     <.0001

Parameter Estimates

---Heteroscedasticity Consistent--
Standard
Variable    Label           DF         Error    t Value    Pr > |t|

Intercept   Intercept        1       2.58504       4.60      <.0001
female                       1       0.94931       5.73      <.0001
math        math score       1       0.06359       6.25      <.0001
read        reading score    1       0.05874       5.54      <.0001

HCC Approximation Method: HC1

Collinearity Diagnostics

Condition
Number     Eigenvalue          Index

1        3.58262        1.00000
2        0.38760        3.04024
3        0.01873       13.83149
4        0.01105       18.00780

Collinearity Diagnostics

-----------------Proportion of Variation----------------
Number      Intercept         female           math           read

1        0.00199        0.02429        0.00129        0.00155
2        0.00333        0.94447        0.00305        0.00402
3        0.90676        0.03123        0.04497        0.33778
4        0.08791     0.00000813        0.95069        0.65665

Test of First and Second
Moment Specification

DF    Chi-Square    Pr > ChiSq

8         20.78        0.0078



• PROG GLM

The model below has an interaction of a categorical variable with a continuous variable. SAS 9.2 creates an ANOVA plot if we just turn the ODS graphics on.

ods graphics on;
proc glm data = ats.hsb2;
class female ;
model write = female math female*math ;
run;
quit;
ods graphics off;

Proc glm in SAS 9.2 provides measures of effect size. Notice that this option is still experimental.

proc glm data = ats.hsb2;
class female prog;
model write = female prog female*prog /ss3 effectsize;
run;
quit;

Sum of
Source                     DF        Squares    Mean Square   F Value   Pr > F

Model                       5     4630.36091      926.07218     13.56   <.0001

Error                     194    13248.51409       68.29131

Corrected Total           199    17878.87500

R-Square     Coeff Var      Root MSE    write Mean

0.258985      15.65866      8.263856      52.77500

Overall Noncentrality

Min Var Unbiased Estimate    62.104
Low MSE Estimate             61.457
95% Confidence Limits        (33.709,102.7)

Proportion of Variation Accounted for

Eta-Square                   0.26
Omega-Square                 0.24
95% Confidence Limits        (0.14,0.34)

Source                     DF    Type III SS    Mean Square   F Value   Pr > F

female                      1    1261.853291    1261.853291     18.48   <.0001
prog                        2    3274.350821    1637.175410     23.97   <.0001
female*prog                 2     325.958189     162.979094      2.39   0.0946

Noncentrality Parameter

Min Var
Unbiased        Low MSE
Source                    Estimate       Estimate    95% Confidence Limits

female                       17.29           17.1        5.23     39.7
prog                         45.45           45.0       22.56     79.8
female*prog                   2.72            2.7        0.00     15.9

Total Variation Accounted For

Semipartial
Semipartial         Omega-         Conservative
Source                  Eta-Square         Square    95% Confidence Limits

female                      0.0706         0.0665     0.0173  0.1469
prog                        0.1831         0.1748     0.0911  0.2718
female*prog                 0.0182         0.0106     0.0000  0.0637

Partial Variation Accounted For

Partial
Partial         Omega-
Source                  Eta-Square         Square    95% Confidence Limits

female                      0.0870         0.0804     0.0255  0.1656
prog                        0.1982         0.1868     0.1014  0.2851
female*prog                 0.0240         0.0137     0.0000  0.0735

• PROC LOGISTIC

When an interaction term is present, odds ratios are calculated and graphed as shown in the example below.

data hsb2;
set ats.hsb2;
hon=(write>60);
run;
ods graphics on;
proc logistic data = hsb2 descending;
model hon = female math female*math;
oddsratio female / at(math = 45 50 65);
run;
ods graphics off;
The LOGISTIC Procedure

Analysis of Maximum Likelihood Estimates

Standard          Wald
Parameter      DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept       1     -8.7458      2.1291       16.8729        <.0001
female          1     -2.8998      3.0942        0.8783        0.3487
math            1      0.1294      0.0359       12.9994        0.0003
female*math     1      0.0670      0.0535        1.5704        0.2101

Wald Confidence Interval for Odds Ratios

Label                     Estimate    95% Confidence Limits

female at math=45            1.122       0.245        5.139
female at math=50            1.568       0.517        4.759
female at math=65            4.284       1.386       13.237

When there is a quasi-complete separation of data points, the maximum likelihood estimate may not exist. SAS 9.2 provides Firth estimation for dealing with the issue of quasi or complete separation of data points.

data test;
input Y X freq;
datalines;
0 1	3
0 2	4
0 3	5
0 3	10
1 3	6
1 4	12
1 5	8
1 6	9
1 10 11
1 11 6
;
run;
proc logistic data = test descending;
freq freq;
model y = x;
run;
The LOGISTIC Procedure
WARNING: The validity of the model fit is questionable.

Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        64.9376        1         <.0001
Score                   26.0506        1         <.0001
Wald                     0.0859        1         0.7695

Analysis of Maximum Likelihood Estimates

Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1    -32.8245       108.9        0.0909        0.7630
X             1     10.6361     36.2903        0.0859        0.7695

proc logistic data = test descending;
freq freq;
model y = x /firth;
run;
        Testing Global Null Hypothesis: BETA=0

Test                 Chi-Square       DF     Pr > ChiSq

Likelihood Ratio        57.0231        1         <.0001
Score                   25.3902        1         <.0001
Wald                     7.6435        1         0.0057

Analysis of Maximum Likelihood Estimates

Standard          Wald
Parameter    DF    Estimate       Error    Chi-Square    Pr > ChiSq

Intercept     1    -13.0905      4.5755        8.1851        0.0042
X             1      4.0766      1.4745        7.6435        0.0057

ROC curves and ROC curve contrast.

ods graphics on;
proc logistic data=hsb2 plots=roc(id=prob);
model hon = female math read;
roc 'female' female;
roc 'maths score' math;
roc 'read' read;
roccontrast reference('female') / estimate e;
run;
ods graphics off;
                         ROC Association Statistics

-------------- Mann-Whitney -------------
Standard         95% Wald        Somers' D
ROC Model         Area      Error    Confidence Limits       (Gini)      Gamma

Model           0.8569     0.0288     0.8005     0.9134      0.7139     0.7142
female          0.5716     0.0400     0.4932     0.6499      0.1431     0.2880
maths score     0.8325     0.0329     0.7681     0.8970      0.6651     0.6792
read            0.7979     0.0325     0.7343     0.8616      0.5959     0.6298

ROC Association Statistics

ROC Model        Tau-a

Model           0.2654
female          0.0532
maths score     0.2473
read            0.2216

ROC Contrast Coefficients

ROC Model            Row1          Row2          Row3

Model                   1             0             0
female                 -1            -1            -1
maths score             0             1             0
read                    0             0             1

ROC Contrast Test Results

Contrast                DF    Chi-Square    Pr > ChiSq

Reference = female       3      113.0593        <.0001

ROC Contrast Rows Estimation and Testing Results

Standard       95% Wald                     Pr >
Contrast              Estimate     Error   Confidence Limits  Chi-Square   ChiSq

Model - female          0.2854    0.0439    0.1994    0.3714     42.3060  <.0001
maths score - female    0.2610    0.0532    0.1567    0.3652     24.0700  <.0001
read - female           0.2264    0.0543    0.1199    0.3329     17.3547  <.0001

#### 4. New graphics procedures for statistical graphics

proc sgplot data=ats.hsb2;
dot ses / response=write stat=mean
limitstat=stddev numstd=1;
run;

proc sgplot data=ats.hsb2;
scatter x=math y=write;
ellipse x=math y=write;
keylegend / location=inside position=bottomright;
run;
title;
filename odsout 'c:\sas\temp\test.htm';
goptions device = java ;
ods listing close;
ods html file=odsout style=styles.ocean;
proc gchart data=ats.hsb2;
block prog /sumvar= write type=mean;
run;
ods html close;
ods listing;

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.