UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Textbook Examples
Regression Analysis by Example by Chatterjee, Hadi and Price
Chapter 10: Biased Estimation of Regression Coefficients

Inputting the French Economy data, p. 233.
data p233;
  input YEAR IMPORT DOPROD STOCK CONSUM;
cards;
49 15.9 149.3 4.2 108.1
50 16.4 161.2 4.1 114.8
51 19 171.5 3.1 123.2
52 19.1 175.5 3.1 126.9
53 18.8 180.8 1.1 132.1
54 20.4 190.7 2.2 137.7
55 22.7 202.1 2.1 146
56 26.5 212.4 5.6 154.1
57 28.1 226.1 5 162.3
58 27.6 231.9 5.1 164.3
59 26.3 239 0.7 167.6
60 31.1 258 5.6 176.8
61 33.3 269.8 3.9 186.6
62 37 288.4 3.1 199.7
63 43.3 304.5 4.6 213.9
64 49 323.4 7 223.8
65 50.3 336.8 1.2 232
66 56.6 353.9 4.5 242.9
;
run;

Creating the standardized variables for the subset of the dataset where year <= 59, p. 264.
data temp;
  set p233;
  if year LE 59;
run;
proc sql;
  create table temp1 as
  select *, (import - mean(import))/std(import)  as importstd, 
            (doprod - mean(doprod))/std(doprod) as doprodstd,
	    (consum - mean(consum))/std(consum) as consumstd,
	    (stock - mean(stock))/std(stock) as stockstd
  from temp;
quit;

Table 10.1, p. 265.
proc reg data = temp1;
  model importstd = doprodstd  stockstd consumstd/ noint;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: importstd
NOTE: No intercept in model. R-Square is redefined.
                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     3        9.91897        3.30632     326.41    <.0001
Error                     8        0.08103        0.01013
Uncorrected Total        11       10.00000

Root MSE              0.10064    R-Square     0.9919
Dependent Mean    7.06506E-17    Adj R-Sq     0.9889
Coeff Var         1.424539E17
                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
doprodstd     1       -0.33934        0.43405      -0.78      0.4568
stockstd      1        0.21305        0.03213       6.63      0.0002
consumstd     1        1.30268        0.43418       3.00      0.0171

Generating the principal components for the predictor variables, p. 265.
ods listing close;
proc princomp data = p233 out = temp;
 where year <= 59;
 var doprod stock consum;
 ods output EigenVectors=eig;
run;
ods listing;
proc print data = eig;
run;
Obs    Variable       Prin1       Prin2       Prin3

 1      DOPROD     0.706330    -.035689    0.706982
 2      STOCK      0.043501    0.999029    0.006971
 3      CONSUM     0.706544    -.025830    -.707197

Standardizing the dependent variable and multiplying the third principal component by -1 in order to have the same results as in the book.
proc sql;  
 create table tempstd as
 select *, (import - mean(import))/std(import) as zimport, -1*prin3 as nprin3
 from temp;
quit;
Table 10.2, p. 265.
proc reg data = tempstd;
  model zimport = prin1 prin2 nprin3/noint;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: zimport
NOTE: No intercept in model. R-Square is redefined.
                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     3        9.91897        3.30632     326.41    <.0001
Error                     8        0.08103        0.01013
Uncorrected Total        11       10.00000

Root MSE              0.10064    R-Square     0.9919
Dependent Mean    7.06506E-17    Adj R-Sq     0.9889
Coeff Var         1.424539E17
                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Prin1         1        0.68998        0.02251      30.65      <.0001
Prin2         1        0.19130        0.03186       6.01      0.0003
nprin3        1        1.15968        0.61354       1.89      0.0954

Inputting the data on p. 270.
data p270;
  input u c1 c2 c3 c4;
cards;
   .955      1.467      1.903       -.53      .0389
  -.746      2.136       .238       -.29       -.03
 -2.323      -1.13       .184       -.01      -.094
   -.82        .66      1.577       .179      -.033
   .471      -.359       .484       -.74       .019
  -.299      -.967        .17       .086      -.012
    .21      -.931     -2.135      -.173       .008
   .558      2.232      -.692        .46       .023
 -1.119       .352     -1.432      -.032      -.045
   .496     -1.663      1.828       .851        .02
   .781      1.641     -1.295       .494       .031
   .918     -1.693      -.392       -.02       .037
   .918     -1.746      -.438      -.275       .037
run;

Table 10.5-10.6, p. 270.
proc reg data = p270;
  model u = C1-C4/ noint;
  model u = C1-C3/ noint;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: u
NOTE: No intercept in model. R-Square is redefined
                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     4       11.99718        2.99930    71530.6    <.0001
Error                     9     0.00037737     0.00004193
Uncorrected Total        13       11.99756

Root MSE              0.00648    R-Square     1.0000
Dependent Mean    5.12411E-17    Adj R-Sq     1.0000
Coeff Var         1.263705E16
                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
c1            1       -0.00180        0.00125      -1.44      0.1837
c2            1       -0.00255        0.00149      -1.71      0.1206
c3            1        0.00165        0.00433       0.38      0.7122
c4            1       24.76590        0.04630     534.90      <.0001

The REG Procedure
Model: MODEL2
Dependent Variable: u
NOTE: No intercept in model. R-Square is redefined.
                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     3     0.00005463     0.00001821       0.00    1.0000
Error                    10       11.99751        1.19975
Uncorrected Total        13       11.99756

Root MSE              1.09533    R-Square     0.0000
Dependent Mean    5.12411E-17    Adj R-Sq    -0.3000
Coeff Var         2.137605E18
                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
c1            1       -0.00122        0.21144      -0.01      0.9955
c2            1    -0.00012426        0.25186      -0.00      0.9996
c3            1        0.00252        0.73202       0.00      0.9973

Fig. 10.1, p. 271.
symbol v=dot h=.8 c=blue;
proc gplot data = p270;
  plot u*C1 u*C2 u*C3 u*C4;
run;
quit;

Creating the data to be used in fig. 10.2, p. 273 and tables 10.7-10.8 on p.274 and p. 277.
proc reg data = p233 outest = temp outstb noprint;
  where year <= 59;
  model import = doprod stock consum/
         ridge = (0.00 0.001 to .009 by .002 0.010 to 0.03 by 0.002 0.03 to 0.09 by 0.01 0.1 to 1.0 by 0.1)
         outvif ;
run;
quit;

Fig. 10.2, p. 273.
Note: Usually this graph can be supplied by SAS by adding a plot statement with a ridgeplot option in the proc reg that is doing the ridge regression. However, for this particular dataset we were unable to get the proc reg to reproduce the graph whereas using a proc gplot we were able to reproduce the graph in the book.
symbol1 i=join v=circle h =.8 c=blue;
symbol2 i=join v=circle h =.8 c=red;
symbol3 i=join v=circle h =.8 c=green;
legend1 label=none position=(top center inside)
        mode=share;
axis1 label=(angle=90 'Ridge coefficients'); 
proc gplot data = temp;
  where _type_ = 'RIDGESTB';
  plot (doprod stock consum)*_ridge_/ overlay legend=legend1 vaxis=axis1 vref=0;
run;
quit;

Table 10.7, p. 274.
proc print data = temp;
 where _type_ = 'RIDGESTB';
 var _ridge_ doprod stock consum;
run;
Obs    _RIDGE_     DOPROD      STOCK      CONSUM

  4     0.000     -0.33934    0.21305    1.30268
  7     0.001     -0.11745    0.21503    1.08024
 10     0.003      0.09215    0.21669    0.86963
 13     0.005      0.19249    0.21728    0.76831
 16     0.007      0.25122    0.21745    0.70862
 19     0.009      0.28970    0.21743    0.66919
 22     0.010      0.30433    0.21737    0.65408
 25     0.012      0.32753    0.21720    0.62993
 28     0.014      0.34505    0.21698    0.61146
 31     0.016      0.35873    0.21671    0.59684
 34     0.018      0.36967    0.21643    0.58496
 37     0.020      0.37861    0.21612    0.57509
 40     0.022      0.38602    0.21580    0.56674
 43     0.024      0.39225    0.21547    0.55958
 46     0.026      0.39755    0.21514    0.55335
 49     0.028      0.40210    0.21479    0.54787
 52     0.030      0.40604    0.21445    0.54300
 55     0.030      0.40604    0.21445    0.54300
 58     0.040      0.41955    0.21267    0.52488
 61     0.050      0.42709    0.21087    0.51279
 64     0.060      0.43152    0.20907    0.50384
 67     0.070      0.43414    0.20729    0.49675
 70     0.080      0.43560    0.20553    0.49086
 73     0.090      0.43630    0.20380    0.48578
 76     0.100      0.43645    0.20209    0.48128
 79     0.200      0.42646    0.18639    0.44994
 82     0.300      0.41123    0.17298    0.42738
 85     0.400      0.39575    0.16140    0.40818
 88     0.500      0.38091    0.15130    0.39107
 91     0.600      0.36693    0.14242    0.37554
 94     0.700      0.35381    0.13454    0.36131
 97     0.800      0.34153    0.12750    0.34818
100     0.900      0.33003    0.12117    0.33601
103     1.000      0.31925    0.11546    0.32469

Table 10.8, p. 277.
proc print data = temp;
 where _type_ = 'RIDGEVIF';
 var _ridge_ doprod stock consum;
run;
Obs    _RIDGE_     DOPROD     STOCK      CONSUM

  2     0.000     185.997    1.01891    186.110
  5     0.001      98.981    1.00845     99.041
  8     0.003      41.779    0.99890     41.804
 11     0.005      22.988    0.99311     23.001
 14     0.007      14.570    0.98836     14.579
 17     0.009      10.089    0.98401     10.095
 20     0.010       8.599    0.98192      8.604
 23     0.012       6.480    0.97783      6.483
 26     0.014       5.075    0.97384      5.078
 29     0.016       4.097    0.96991      4.099
 32     0.018       3.388    0.96603      3.389
 35     0.020       2.858    0.96219      2.859
 38     0.022       2.452    0.95838      2.452
 41     0.024       2.133    0.95461      2.134
 44     0.026       1.878    0.95086      1.879
 47     0.028       1.672    0.94714      1.672
 50     0.030       1.502    0.94345      1.502
 53     0.030       1.502    0.94345      1.502
 56     0.040       0.979    0.92532      0.979
 59     0.050       0.723    0.90773      0.723
 62     0.060       0.579    0.89065      0.578
 65     0.070       0.489    0.87405      0.488
 68     0.080       0.429    0.85792      0.428
 71     0.090       0.386    0.84222      0.386
 74     0.100       0.355    0.82696      0.355
 77     0.200       0.240    0.69474      0.240
 80     0.300       0.204    0.59187      0.204
 83     0.400       0.182    0.51027      0.182
 86     0.500       0.166    0.44446      0.165
 89     0.600       0.152    0.39061      0.152
 92     0.700       0.140    0.34598      0.140
 95     0.800       0.130    0.30859      0.130
 98     0.900       0.121    0.27695      0.121
101     1.000       0.113    0.24994      0.112

Table 10.9, p. 277.
Note: The intercept = 0 indicates that the row contains the standardized coefficients.
proc reg data = p233 outest = temp outstb noprint;
  where year <= 59;
  model import = doprod stock consum/  ridge =  (0.00, 0.04) ; 
run;
quit;
proc print data = temp;
 where _ridge_ ~=.;
 by _ridge_;
 var _ridge_  intercept doprod stock consum;
run;
Ridge regression control value=0
Obs    _RIDGE_    Intercept     DOPROD      STOCK      CONSUM

 2        0        -10.1280    -0.05140    0.58695    0.28685
 3        0          0.0000    -0.33934    0.21305    1.30268

Ridge regression control value=0.04
Obs    _RIDGE_    Intercept     DOPROD     STOCK      CONSUM

 4       0.04      -8.55832    0.06354    0.58591    0.11558
 5       0.04       0.00000    0.41955    0.21267    0.52488

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California