UCLA Academic Technology Services HomeServicesClassesContactJobs

SAS Textbook Examples
Applied Regression Analysis by John Fox
Chapter 11: Unusual and Influential Data

Section 11.1

Regression analysis in the middle of page 269 using the davis data file.  First create a dataset with a dummy variable called female and an interaction regressor measwt as the product of measured weight and female. Run regression on them.
data davisIn; /*A dataset with the interaction variable and a subject variable*/
  set davis;
  measwt_f = measwt * (1-male);
  female=1-male;
  drop male;
run;
proc reg data=davisIn;
  model reptwt=measwt female measwt_f;
run;
quit;

The REG Procedure
Model: MODEL1
Dependent Variable: reptwt

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3          30655          10218     470.41    <.0001
Error                   179     3888.25423       21.72209
Corrected Total         182          34543


Root MSE              4.66070    R-Square     0.8874
Dependent Mean       65.62295    Adj R-Sq     0.8856
Coeff Var             7.10224

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1        1.35864        3.27719       0.41      0.6789
measwt        1        0.98982        0.04260      23.24      <.0001
female        1       39.96412        3.92932      10.17      <.0001
measwt_f      1       -0.72536        0.05598     -12.96      <.0001

Now we fix the error in case 12 of the dataset. We keep the dataset with the error as davisIn and call our new fixed dataset as davis_co.

data davis_co;
  set davisIn;
  if _n_=12 then do
  temp=measht;
  measht=measwt;
  measwt=temp;
  measwt_f=temp;
  end;
  drop temp;
run;
proc reg data=davis_co ;
  model reptwt=measwt female measwt_f;
run;
quit;

The REG Procedure
Model: MODEL1
Dependent Variable: reptwt

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3          33642          11214    2228.78    <.0001
Error                   179      900.63897        5.03150
Corrected Total         182          34543


Root MSE              2.24310    R-Square     0.9739
Dependent Mean       65.62295    Adj R-Sq     0.9735
Coeff Var             3.41817

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1        1.35864        1.57725       0.86      0.3902
measwt        1        0.98982        0.02050      48.28      <.0001
female        1        1.98252        2.45028       0.81      0.4195
measwt_f      1       -0.05668        0.03845      -1.47      0.1422

To get the formula on page 270, we need to create a dataset containing the dummy variable of female and a new interaction regressor reptwt_f. Then we run regression of reported weight on measured weight. We run them on both corrected and uncorrected dataset.

Corrected dataset
data davis_co1;
  set davis;
  reptwt_f=reptwt*(1-male);
  female=1-male;
  if subject=12 then do 
  temp=measwt;
  measwt=measht;
  measht=temp;
  end;
  drop male temp;
run;
proc reg data=davis_co1;
  model measwt=reptwt female reptwt_f;
run;
quit;

The REG Procedure
Model: MODEL1
Dependent Variable: measwt

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3          31722          10574    2082.30    <.0001
Error                   179      908.96214        5.07800
Corrected Total         182          32631

Root MSE              2.25344    R-Square     0.9721
Dependent Mean       65.62842    Adj R-Sq     0.9717
Coeff Var             3.43364

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1        1.79428        1.57989       1.14      0.2576
reptwt        1        0.96892        0.02038      47.55      <.0001
female        1       -0.01678        2.47955      -0.01      0.9946
reptwt_f      1        0.00831        0.03917       0.21      0.8323

Now on uncorrected dataset.

data davisIn1;
  set davis;
  reptwt_f=reptwt*(1-male);
  female=1-male;
  drop male;
run;
proc reg data=davisIn1;
  model measwt=reptwt female reptwt_f;
run;
quit;

The REG Procedure
Model: MODEL1
Dependent Variable: measwt

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3          29786     9928.79278     139.07    <.0001
Error                   179          12779       71.39350
Corrected Total         182          42566


Root MSE              8.44947    R-Square     0.6998
Dependent Mean       66.22404    Adj R-Sq     0.6947
Coeff Var            12.75891

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1        1.79428        5.92394       0.30      0.7623
reptwt        1        0.96892        0.07641      12.68      <.0001
female        1        2.07421        9.29727       0.22      0.8237
reptwt_f      1       -0.00953        0.14685      -0.06      0.9484

To produce Figure 11.2. on page 270, we first create a dataset that contains two new variables freptwt and mreptwt for reported weight on female and male respectively. Then we run proc glm on the dataset and output the predicted values for both female an male group. Then we use SAS proc gplot to render the plot

data davisPr;
  set davis;
  if male=1 then  
  mreptwt=reptwt;
  if male=0 then 
  freptwt=reptwt;
  output;
run; /*dataset created*/
proc glm data=davisPr;
  model mreptwt freptwt =measwt;
  output out=dvsOut p=pm pf;
run;
quit;
symbol1 c=black i=none v='M' height=0.5;
symbol2 c=black i=join v=none height=1.5;
symbol3 c=blue  i=none v='F' height=0.5; 
symbol4 c=blue  i=join v=none height=1.5;
axis1  label=(r=0 a=90);
filename outfiles 'chp11Fig1.gif';
goptions gsfname=outfiles dev=gif373;
proc sort data=dvsOut;
by measwt;
run;
proc gplot data=dvsOut;
  plot mreptwt*measwt=1 pm*measwt=2 freptwt*measwt=3 pf*measwt=4
  /overlay vaxis=axis1;
  label measwt='Measured Weight, Kg.';
  label mreptwt='Reported Weight, Kg.';
run;
quit;

Section 11.2

Page 271, make hat diagonal (leverage) and show largest values.
proc reg data=davisIn ;
  model reptwt=measwt female measwt_f;
  output out=dvsLev p=pr h=lev;/* p for predicted h for leverage*/;
run;
quit;
proc univariate data=dvsLev;
  var lev;
run;
proc print data=dvsLev;
  where lev ge 0.7;
run;

The REG Procedure
Model: MODEL1
Dependent Variable: reptwt

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square  F Value  Pr > F

Model                     3          30655          10218   470.41  <.0001
Error                   179     3888.25423       21.72209
Corrected Total         182          34543


Root MSE              4.66070    R-Square     0.8874
Dependent Mean       65.62295    Adj R-Sq     0.8856
Coeff Var             7.10224

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1        1.35864        3.27719       0.41      0.6789
measwt        1        0.98982        0.04260      23.24      <.0001
female        1       39.96412        3.92932      10.17      <.0001
measwt_f      1       -0.72536        0.05598     -12.96      <.0001

The UNIVARIATE Procedure
Variable:  lev  (Leverage)

                            Moments

N                         200    Sum Weights                200
Mean                0.0212232    Sum Observations     4.2446399
Std Deviation       0.0514178    Variance            0.00264379
Skewness           12.5535875    Kurtosis             168.15714
Uncorrected SS     0.61619913    Corrected SS        0.52611429
Coeff Variation    242.271682    Std Error Mean      0.00363579

              Basic Statistical Measures

    Location                    Variability

Mean     0.021223     Std Deviation            0.05142
Median   0.013143     Variance                 0.00264
Mode     0.010224     Range                    0.70428
                      Interquartile Range      0.00754

NOTE: The mode displayed is the smallest of 3 modes with a count of 8.

           Tests for Location: Mu0=0

Test           -Statistic-    -----p Value------

Student's t    t  5.837304    Pr > |t|    <.0001
Sign           M       100    Pr >= |M|   <.0001
Signed Rank    S     10050    Pr >= |S|   <.0001

Quantiles (Definition 5)

Quantile         Estimate

100% Max       0.71418565
99%            0.12002413
95%            0.04561218
90%            0.02856951
75% Q3         0.01856466
50% Median     0.01314321
25% Q1         0.01102743
10%            0.01014961
5%             0.00993016
1%             0.00990671
0% Min         0.00990671


The UNIVARIATE Procedure
Variable:  lev  (Leverage)

              Extreme Observations

-------Lowest-------        ------Highest------

      Value      Obs             Value      Obs

 0.00990671      188         0.0645111       30
 0.00990671      159         0.0687759       54
 0.00990671       28         0.0732077       97
 0.00990671        2         0.1668405       21
 0.00993016      193         0.7141856       12
Obs  subject sex measwt measht reptwt reptht measwt_f female    pr     lev
 12    12    F    166    57     56     163     166      1   85.2230 0.71419

With the error corrected:

proc reg data=davis_co;
  model measwt=reptwt female measwt_f;
  output out=dvs_coH p=pm h=lev;
run;
quit;
proc univariate data=dvs_coH;
  var lev;
run;

The REG Procedure
Model: MODEL1
Dependent Variable: measwt

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3          31777          10592    2220.97    <.0001
Error                   179      853.69481        4.76924
Corrected Total         182          32631


Root MSE              2.18386    R-Square     0.9738
Dependent Mean       65.62842    Adj R-Sq     0.9734
Coeff Var             3.32761

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1        4.14385        1.50728       2.75      0.0066
reptwt        1        0.93823        0.01943      48.28      <.0001
female        1       -7.27862        2.32718      -3.13      0.0021
measwt_f      1        0.12450        0.03650       3.41      0.0008

The UNIVARIATE Procedure
Variable:  lev  (Leverage)

                            Moments

N                         183    Sum Weights                183
Mean               0.02185792    Sum Observations             4
Std Deviation       0.0193944    Variance            0.00037614
Skewness           4.83073039    Kurtosis            33.3623499
Uncorrected SS     0.15588965    Corrected SS        0.06845796
Coeff Variation     88.729365    Std Error Mean      0.00143368

              Basic Statistical Measures

    Location                    Variability

Mean     0.021858     Std Deviation            0.01939
Median   0.015604     Variance               0.0003761
Mode     0.015604     Range                    0.18047
                      Interquartile Range      0.00864

           Tests for Location: Mu0=0

Test           -Statistic-    -----p Value------

Student's t    t  15.24608    Pr > |t|    <.0001
Sign           M      91.5    Pr >= |M|   <.0001
Signed Rank    S      8418    Pr >= |S|   <.0001

Quantiles (Definition 5)

Quantile         Estimate

100% Max       0.19040469
99%            0.10076897
95%            0.04995763
90%            0.03947617
75% Q3         0.02102730
50% Median     0.01560388
25% Q1         0.01238808
10%            0.01075212
5%             0.01037395
1%             0.00993415
0% Min         0.00993415

The UNIVARIATE Procedure
Variable:  lev  (Leverage)

              Extreme Observations

-------Lowest-------        ------Highest------

      Value      Obs             Value      Obs

 0.00993415      160         0.0799200       29
 0.00993415       90         0.0846260      115
 0.00993415       12         0.0855655       54
 0.01008300      151         0.1007690       64
 0.01009729      108         0.1904047       21

               Missing Values

                       -----Percent Of-----
Missing                             Missing
  Value       Count     All Obs         Obs

      .          17        8.50      100.00

Section 11.3

Page 274, middle of page. Make studentized residual and show largest value.
proc reg data=davisIn;
  model reptwt = measwt female measwt_f;
  output out=dvsRs rstudent=rs; /*studentized residuals*/
run;
quit;
proc univariate data=dvsRs;
  var rs;
run;
proc print data=dvsRs;
  where  rs < -24.3 AND rs ne .;
run;

The REG Procedure
Model: MODEL1
Dependent Variable: reptwt

                             Analysis of Variance

                                 Sum of           Mean
Source                DF        Squares         Square    F Value    Pr > F

Model                  3          30655          10218     470.41    <.0001
Error                179     3888.25423       21.72209
Corrected Total      182          34543

Root MSE              4.66070    R-Square     0.8874
Dependent Mean       65.62295    Adj R-Sq     0.8856
Coeff Var             7.10224

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1        1.35864        3.27719       0.41      0.6789
measwt        1        0.98982        0.04260      23.24      <.0001
female        1       39.96412        3.92932      10.17      <.0001
measwt_f      1       -0.72536        0.05598     -12.96      <.0001

The UNIVARIATE Procedure
Variable:  rs  (Studentized Residual without Current Obs)

                            Moments

N                         183    Sum Weights                183
Mean               -0.0961781    Sum Observations     -17.60059
Std Deviation      2.00831794    Variance            4.03334093
Skewness            -9.637042    Kurtosis            117.053571
Uncorrected SS      735.76084    Corrected SS        734.068049
Coeff Variation    -2088.1242    Std Error Mean      0.14845913

              Basic Statistical Measures

    Location                    Variability

Mean     -0.09618     Std Deviation            2.00832
Median   -0.02849     Variance                 4.03334
Mode     -0.18673     Range                   27.80109
                      Interquartile Range      0.95002

           Tests for Location: Mu0=0

Test           -Statistic-    -----p Value------

Student's t    t  -0.64784    Pr > |t|    0.5179
Sign           M      -2.5    Pr >= |M|   0.7676
Signed Rank    S      -188    Pr >= |S|   0.7941

Quantiles (Definition 5)

Quantile         Estimate

100% Max        3.4966276
99%             3.0813780
95%             1.5666415
90%             1.0406940
75% Q3          0.4462518
50% Median     -0.0284926
25% Q1         -0.5037653
10%            -0.9816218
5%             -1.4664439
1%             -2.3493765
0% Min        -24.3044630

The UNIVARIATE Procedure
Variable:  rs  (Studentized Residual without Current Obs)

            Extreme Observations

------Lowest------        -----Highest-----

    Value      Obs           Value      Obs

-24.30446       12         1.89365      129
 -2.34938       29         2.39320       31
 -2.18985      155         2.90657       64
 -1.95943      130         3.08138       50
 -1.90865      153         3.49663      115

               Missing Values

                       -----Percent Of-----
Missing                             Missing
  Value       Count     All Obs         Obs

      .          17        8.50      100.00
Obs subject  sex  measwt  measht  reptwt  reptht  measwt_f  female   rs

 12    12     F     166     57      56      163      166       1   -24.3045

Section 11.4

Middle of page 276, computing DFBETA for measwt, female and measwt_f. In our regression procedure, we use the option influence and use the ODS facilities to output a dataset that contains all the DFBETAS. The index plot shows an observation influencing female and influencing female*measwt.
proc reg data=davisIn;
  model reptwt=measwt female measwt_f/influence ;
  ods output OutputStatistics=dvsOut;
run;
quit;
filename outfiles 'chp11dfbeta.gif';
goptions gsfname=outfiles dev=gif373;
symbol1 c=black i=none v=star h=0.5;
symbol2 c=blue  i=none v=dot h=0.5;
symbol3 c=green i=none v=circle h=0.5;
proc gplot data=dvsOut;
  plot (DFB_measwt)*observation=1
  (DFB_female)*observation=2
  (DFB_measwt_f)*observation=3/overlay;
run;
quit;

The REG Procedure
Model: MODEL1
Dependent Variable: reptwt

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3          30655          10218     470.41    <.0001
Error                   179     3888.25423       21.72209
Corrected Total         182          34543


Root MSE              4.66070    R-Square     0.8874
Dependent Mean       65.62295    Adj R-Sq     0.8856
Coeff Var             7.10224

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1        1.35864        3.27719       0.41      0.6789
measwt        1        0.98982        0.04260      23.24      <.0001
female        1       39.96412        3.92932      10.17      <.0001
measwt_f      1       -0.72536        0.05598     -12.96      <.0001

The REG Procedure
Model: MODEL1
Dependent Variable: reptwt

                                    Output Statistics
                         Hat Diag      Cov          ---------------DFBETAS--------------
 Obs  Residual  RStudent        H    Ratio   DFFITS Intercept   measwt   female measwt_f
   1   -0.5749   -0.1238   0.0123   1.0350  -0.0138   -0.0010  -0.0012   0.0008   0.0009
   2   -5.6614   -1.2225   0.0099   0.9989  -0.1223   -0.0000   0.0000  -0.0160   0.0019
   3   -1.3391   -0.2883   0.0116   1.0327  -0.0312   -0.0000   0.0000  -0.0099   0.0078
  :	:	:	  :	   :	    :	      :        :        :        :
 196   -3.6055   -0.7776   0.0125   1.0217  -0.0876   -0.0275   0.0141   0.0230  -0.0108
 197   -3.5139   -0.7593   0.0163   1.0263  -0.0978    0.0353  -0.0492  -0.0294   0.0374
 198         .         .   0.0143    .        .         .        .        .        .
 199    0.5574    0.1210   0.0286   1.0525   0.0208   -0.0134   0.0157   0.0112  -0.0120
 200    1.4454    0.3114   0.0130   1.0338   0.0357   -0.0031   0.0087   0.0026  -0.0066


Sum of Residuals                           0
Sum of Squared Residuals          3888.25423
Predicted Residual SS (PRESS)          13623

The following segment illustrates the facility of SAS INSIGHT for scatterplot matrix from the command line. To some people it may be easier to do it from the SAS pulldown menus (e.g., click on Solutions then Analysis then Interactive Data Analysis).

proc insight data=dvsOut;
  scatter DFB_measwt DFB_female DFB_measwt_f observation*
          DFB_measwt DFB_female DFB_measwt_f observation;
run;
quit;

Bottom part of page 277, computing and showing Cook's D, DFFITS, DFBETAS.

Compute Cook's D and FFITS:
proc reg data=davisIn;
  model reptwt=measwt female measwt_f;
  output out=dvsSum cookd=ck;
run;
quit;
proc univariate data=dvsSum;
  var ck;
run;
proc univariate data=dvsOut;
  var  dffits DFB_measwt DFB_female DFB_measwt_f;
run;
proc print data=dvsSum;
where subject=12;
var ck;
run;
proc print data=dvsOut;
  where observation=12;
  var dffits DFB_measwt DFB_female DFB_measwt_f; 
quit;

The REG Procedure
Model: MODEL1
Dependent Variable: reptwt

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     3          30655          10218     470.41    <.0001
Error                   179     3888.25423       21.72209
Corrected Total         182          34543


Root MSE              4.66070    R-Square     0.8874
Dependent Mean       65.62295    Adj R-Sq     0.8856
Coeff Var             7.10224

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1        1.35864        3.27719       0.41      0.6789
measwt        1        0.98982        0.04260      23.24      <.0001
female        1       39.96412        3.92932      10.17      <.0001
measwt_f      1       -0.72536        0.05598     -12.96      <.0001

The UNIVARIATE Procedure
Variable:  ck  (Cook's D Influence Statistic)

                            Moments

N                         183    Sum Weights                183
Mean               0.47387729    Sum Observations    86.7195445
Std Deviation      6.35162098    Variance             40.343089
Skewness           13.5276811    Kurtosis            182.998763
Uncorrected SS     7383.53663    Corrected SS        7342.44221
Coeff Variation    1340.35141    Std Error Mean      0.46952533

              Basic Statistical Measures

    Location                    Variability

Mean     0.473877     Std Deviation            6.35162
Median   0.000796     Variance                40.34309
Mode     0.000094     Range                   85.92734
                      Interquartile Range      0.00309

           Tests for Location: Mu0=0

Test           -Statistic-    -----p Value------

Student's t    t  1.009269    Pr > |t|    0.3142
Sign           M      91.5    Pr >= |M|   <.0001
Signed Rank    S      8418    Pr >= |S|   <.0001

Quantiles (Definition 5)

Quantile          Estimate

100% Max       8.59273E+01
99%            8.56294E-02
95%            1.99879E-02
90%            9.60576E-03
75% Q3         3.21741E-03
50% Median     7.96056E-04
25% Q1         1.24424E-04
10%            2.60263E-05
5%             2.28352E-05
1%             2.10827E-06
0% Min         2.10827E-06

The UNIVARIATE Procedure
Variable:  ck  (Cook's D Influence Statistic)

               Extreme Observations

--------Lowest-------        -------Highest------

       Value      Obs              Value      Obs

 2.10827E-06      186          0.0624604       50
 2.10827E-06       92          0.0651360       21
 2.10827E-06       85          0.0701759       64
 1.57302E-05      143          0.0856294      115
 1.85113E-05      160         85.9273459       12

               Missing Values

                       -----Percent Of-----
Missing                             Missing
  Value       Count     All Obs         Obs

      .          17        8.50      100.00

The UNIVARIATE Procedure
Variable:  DFFITS

                            Moments

N                         183    Sum Weights                183
Mean               -0.2012365    Sum Observations    -36.826272
Std Deviation      2.84379473    Variance            8.08716846
Skewness           -13.482793    Kurtosis            182.187495
Uncorrected SS     1479.27545    Corrected SS        1471.86466
Coeff Variation    -1413.1608    Std Error Mean      0.21021936

              Basic Statistical Measures

    Location                    Variability

Mean     -0.20124     Std Deviation            2.84379
Median   -0.00290     Variance                 8.08717
Mode     -0.01930     Range                   39.02263
                      Interquartile Range      0.11521

           Tests for Location: Mu0=0

Test           -Statistic-    -----p Value------

Student's t    t  -0.95727    Pr > |t|    0.3397
Sign           M      -2.5    Pr >= |M|   0.7676
Signed Rank    S      -149    Pr >= |S|   0.8362

Quantiles (Definition 5)

Quantile          Estimate

100% Max        0.60332360
99%             0.54072498
95%             0.21210692
90%             0.13627407
75% Q3          0.05600517
50% Median     -0.00289586
25% Q1         -0.05920213
10%            -0.11742211
5%             -0.19584990
1%             -0.43084786
0% Min        -38.41931120

UNIVARIATE Procedure
Variable:  DFFITS

             Extreme Observations

-------Lowest------        ------Highest-----

     Value      Obs            Value      Obs

-38.419311       12         0.351835       17
 -0.430848       29         0.510867       21
 -0.296132      130         0.511565       50
 -0.282346      155         0.540725       64
 -0.260521      128         0.603324      115

               Missing Values

                       -----Percent Of-----
Missing                             Missing
  Value       Count     All Obs         Obs

      .          17        8.50      100.00

The UNIVARIATE Procedure
Variable:  DFB_measwt  (measwt DFBETAS)

                            Moments

N                         183    Sum Weights                183
Mean               0.00039965    Sum Observations    0.07313633
Std Deviation      0.05401908    Variance            0.00291806
Skewness           5.26547487    Kurtosis            43.7449484
Uncorrected SS     0.53111634    Corrected SS        0.53108711
Coeff Variation    13516.5265    Std Error Mean      0.00399321

              Basic Statistical Measures

    Location                    Variability

Mean     0.000400     Std Deviation            0.05402
Median   0.000000     Variance                 0.00292
Mode     0.000000     Range                    0.63678
                      Interquartile Range    0.0000766

           Tests for Location: Mu0=0

Test           -Statistic-    -----p Value------

Student's t    t  0.100083    Pr > |t|    0.9204
Sign           M      19.5    Pr >= |M|   0.0048
Signed Rank    S       715    Pr >= |S|   0.3204

Quantiles (Definition 5)

Quantile          Estimate

100% Max       4.91842E-01
99%            2.80931E-01
95%            3.14289E-02
90%            1.41305E-02
75% Q3         6.86667E-16
50% Median     3.14532E-17
25% Q1        -7.65658E-05
10%           -4.27876E-02
5%            -5.94060E-02
1%            -1.31842E-01
0% Min        -1.44941E-01

The UNIVARIATE Procedure
Variable:  DFB_measwt  (measwt DFBETAS)

             Extreme Observations

-------Lowest------        ------Highest------

     Value      Obs             Value      Obs

-0.1449406      156         0.0881442      111
-0.1318417       97         0.1096412      191
-0.0978029      118         0.2565254       54
-0.0921938       87         0.2809306       17
-0.0904702      192         0.4918421       21

               Missing Values

                       -----Percent Of-----
Missing                             Missing
  Value       Count     All Obs         Obs

      .          17        8.50      100.00

The UNIVARIATE Procedure
Variable:  DFB_female  (female DFBETAS)

                            Moments

N                         183    Sum Weights                183
Mean               0.09419919    Sum Observations    17.2384525
Std Deviation      1.48275408    Variance            2.19855965
Skewness           13.4966624    Kurtosis            182.435032
Uncorrected SS     401.761705    Corrected SS        400.137857
Coeff Variation    1574.06238    Std Error Mean      0.10960834

              Basic Statistical Measures

    Location                    Variability

Mean      0.09420     Std Deviation            1.48275
Median   -0.00501     Variance                 2.19856
Mode     -0.00481     Range                   20.24974
                      Interquartile Range      0.03337

           Tests for Location: Mu0=0

Test           -Statistic-    -----p Value------

Student's t    t  0.859416    Pr > |t|    0.3912
Sign           M     -32.5    Pr >= |M|   <.0001
Signed Rank    S     -3776    Pr >= |S|   <.0001

Quantiles (Definition 5)

Quantile          Estimate

100% Max       20.02775272
99%             0.38703149
95%             0.03421111
90%             0.01663061
75% Q3          0.00396153
50% Median     -0.00500628
25% Q1         -0.02941105
10%            -0.06352601
5%             -0.11406227
1%             -0.22172649
0% Min         -0.22199219

The UNIVARIATE Procedure
Variable:  DFB_female  (female DFBETAS)

             Extreme Observations

------Lowest------        -------Highest------

    Value      Obs              Value      Obs

-0.221992      115          0.0758795      191
-0.221726       29          0.1956965       54
-0.209797       64          0.2036534       17
-0.182301       50          0.3870315       21
-0.142343      130         20.0277527       12

               Missing Values

                       -----Percent Of-----
Missing                             Missing
  Value       Count     All Obs         Obs

      .          17        8.50      100.00

The UNIVARIATE Procedure
Variable:  DFB_measwt_f  (measwt_f DFBETAS)

                            Moments

N                         183    Sum Weights                183
Mean               -0.1163372    Sum Observations     -21.28971
Std Deviation      1.83226898    Variance            3.35720963
Skewness           -13.503022    Kurtosis            182.551683
Uncorrected SS     613.488939    Corrected SS        611.012153
Coeff Variation    -1574.9638    Std Error Mean      0.13544522

              Basic Statistical Measures

    Location                    Variability

Mean     -0.11634     Std Deviation            1.83227
Median    0.00546     Variance                 3.35721
Mode      0.00314     Range                   25.06990
                      Interquartile Range      0.03505

           Tests for Location: Mu0=0

Test           -Statistic-    -----p Value------

Student's t    t  -0.85892    Pr > |t|    0.3915
Sign           M      40.5    Pr >= |M|   <.0001
Signed Rank    S      4418    Pr >= |S|   <.0001

Quantiles (Definition 5)

Quantile          Estimate

100% Max        0.31740249
99%             0.29435380
95%             0.11029567
90%             0.07015686
75% Q3          0.03286291
50% Median      0.00546108
25% Q1         -0.00218745
10%            -0.01330578
5%             -0.02563021
1%             -0.37427787
0% Min        -24.75250165

The UNIVARIATE Procedure
Variable:  DFB_measwt_f  (measwt_f DFBETAS)

             Extreme Observations

-------Lowest-------        ------Highest-----

      Value      Obs            Value      Obs

-24.7525017       12         0.155113       31
 -0.3742779       21         0.233150       29
 -0.2137802       17         0.263616       50
 -0.1952085       54         0.294354       64
 -0.0834338      191         0.317402      115

               Missing Values

                       -----Percent Of-----
Missing                             Missing
  Value       Count     All Obs         Obs

      .          17        8.50      100.00
Obs       ck(Cook's D)

 12       85.9273
                       DFB_        DFB_      DFB_
Obs      DFFITS      measwt      female    measwt_f

 12    -38.4193     -0.0000     20.0278    -24.7525

Top of page 279, computing COVRATIO. Dataset dvsOut already contains it. We summarize it with proc univariate.

proc univariate data=dvsOut;
  var covratio;
run;
proc print data=dvsOut;
  where covratio<0.02 and covratio ge 0;
run;
quit;

The UNIVARIATE Procedure
Variable:  CovRatio  (Cov Ratio)

                            Moments

N                         183    Sum Weights                183
Mean               1.01845274    Sum Observations    186.376851
Std Deviation      0.08330185    Variance             0.0069392
Skewness           -9.9649448    Kurtosis            119.302488
Uncorrected SS     191.078948    Corrected SS        1.26293396
Coeff Variation    8.17925499    Std Error Mean      0.00615785

              Basic Statistical Measures

    Location                    Variability

Mean     1.018453     Std Deviation            0.08330
Median   1.030849     Variance                 0.00694
Mode     1.032772     Range                    1.18186
                      Interquartile Range      0.01719

           Tests for Location: Mu0=0

Test           -Statistic-    -----p Value------

Student's t    t   165.391    Pr > |t|    <.0001
Sign           M      91.5    Pr >= |M|   <.0001
Signed Rank    S      8418    Pr >= |S|   <.0001

Quantiles (Definition 5)

Quantile        Estimate

100% Max       1.1921500
99%            1.0969195
95%            1.0674884
90%            1.0458427
75% Q3         1.0370553
50% Median     1.0308486
25% Q1         1.0198622
10%            0.9867116
5%             0.9600152
1%             0.8073648
0% Min         0.0102869

The UNIVARIATE Procedure
Variable:  CovRatio  (Cov Ratio)

            Extreme Observations

-------Lowest------        -----Highest-----

     Value      Obs           Value      Obs

 0.0102869       12         1.07503       65
 0.8073648      115         1.07525       82
 0.8536158       50         1.09106       30
 0.8789338       64         1.09692       97
 0.9190747       31         1.19215       21

               Missing Values

                       -----Percent Of-----
Missing                             Missing
  Value       Count     All Obs         Obs

      .          17        8.50      100.00
                                      Hat
Obs       Residual     RStudent    Diagonal
 12       -29.2230     -24.3045      0.7142
                                 
Obs    CovRatio      DFFITS   
 12      0.0103    -38.4193

Section 11.6

Page 283 bottom, and figure 11.5 page 284, partial regression plots using data file duncan. We construct a partial regression plot for intercept based on the second footnote on page 283.
data duncan1; /* to create a constant regressor */
  set duncan;
  Int=1;
proc reg data=duncan1 noprint;
  model prestige Int= income educ / noint; /* the option of no intercept*/
  output out=temp r=ry rx;
run;
filename outfiles 'chp11pInt.gif';
goptions gsfname=outfiles dev=gif373;
proc gplot data=temp;
  plot ry*rx /hminor=0 vminor=0;
  label ry='Prestige'
  rx='Intercept';
run;
Following program produces Figure 11.5.on page 284.
filename outfiles 'chp11pInc.gif';
goptions gsfname=outfiles dev=gif373;
proc reg data=duncan;
  model prestige income=educ;
  output out=dnEd  r=prst inc;
run;
proc reg data=dnEd;
  model prst=inc;
  plot prst*inc /haxis=(-50 to 75 by 25) vaxis=(-50 to 100 by 50) nomodel nostat;
  label prst='Prestige';
  label inc='Income';
run;
quit;
filename outfiles 'chp11pEd.gif';
goptions gsfname=outfiles dev=gif373;
proc reg data=duncan;
  model prestige educ=income;
  output out=dcInc r=prst ed;
run;
proc reg data=dcInc;
  model prst=ed;
  plot prst*ed / haxis=(-75 to 50 by 25) vaxis=(-50 to 100 by 50) nomodel nostat;
  label prst='Prestige';
  label ed='Education';
run;
quit;

Figure 11.6. Bubble plot.

filename outfiles 'chp11bbl.gif';
goptions gsfname=outfiles dev=gif373;
proc reg data=duncan;
  model prestige=income educ;
  output out=dncnOut cookd=ck h=lev student=rs;
run;
quit;
axis1 order=(0 to 0.3 by 0.05);
axis2 order=(-2.5 to 5 by 2.5) label=(r=0 a=90);
proc gplot data=dncnOut;
  bubble rs*lev=ck /haxis=axis1 vaxis=axis2  bsize=10 hminor=0 vminor=0;
  label rs='Studentized Residuals';
  label lev='Hat-Value';
run;
quit;

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.