UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Textbook Examples
Regression Analysis by Example by Chatterjee, Hadi and Price
Chapter 7: Weighted Least Squares

Education Expenditure data, table 7.3, p. 189.
data p189;
  length state $ 2 ;
  input State Y X1 X2 X3 Region ;
  label x1 = 'Income'
        x2 = 'Residents under 18'
        x3 = 'Residents in Urban Areas'
         y = 'Expenditure';
cards;
ME 235 3944 325 508 1
NH 231 4578 323 564 1
VT 270 4011 328 322 1
MA 261 5233 305 846 1
RI 300 4780 303 871 1
CT 317 5889 307 774 1
NY 387 5663 301 856 1
NJ 285 5759 310 889 1
PA 300 4894 300 715 1
OH 221 5012 324 753 2
IN 264 4908 329 649 2
IL 308 5753 320 830 2
MI 379 5439 337 738 2
WI 342 4634 328 659 2
MN 378 4921 330 664 2
IA 232 4869 318 572 2
MO 231 4672 309 701 2
ND 246 4782 333 443 2
SD 230 4296 330 446 2
NB 268 4827 318 615 2
KS 337 5057 304 661 2
DE 344 5540 328 722 3
MD 330 5331 323 766 3
VA 261 4715 317 631 3
WV 214 3828 310 390 3
NC 245 4120 321 450 3
SC 233 3817 342 476 3
GA 250 4243 339 603 3
FL 243 4647 287 805 3
KY 216 3967 325 523 3
TN 212 3946 315 588 3
AL 208 3724 332 584 3
MS 215 3448 358 445 3
AR 221 3680 320 500 3
LA 244 3825 355 661 3
OK 234 4189 306 680 3
TX 269 4336 335 797 3
MT 302 4418 335 534 4
ID 268 4323 344 541 4
WY 323 4813 331 605 4
CO 304 5046 324 785 4
NM 317 3764 366 698 4
AZ 332 4504 340 796 4
UT 315 4005 378 804 4
NV 291 5560 330 809 4
WA 312 4989 313 726 4
OR 316 4697 305 671 4
CA 332 5438 307 909 4
AK 546 5613 386 484 4
HI 311 5309 333 831 4
;
run;

Table 7.4, fig. 7.3-7.7, p. 191-192.
goptions reset = all;
 
symbol v=dot h=.8 c=blue;
proc reg data = p189;
  var region;
  model y = x1 x2 x3;
  plot student.*p. student.*region student.*x1 student.*x2 student.*x3;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: Y Expenditure

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     3         109020          36340      22.19    <.0001
Error                    46          75348     1637.99091
Corrected Total          49         184368

Root MSE             40.47210    R-Square     0.5913
Dependent Mean      284.60000    Adj R-Sq     0.5647
Coeff Var            14.22070
                                      Parameter Estimates

                                                 Parameter       Standard
Variable     Label                       DF       Estimate          Error    t Value    Pr > |t|
Intercept    Intercept                    1     -556.56804      123.19525      -4.52      <.0001
X1           Income                       1        0.07239        0.01160       6.24      <.0001
X2           Residents under 18           1        1.55205        0.31467       4.93      <.0001
X3           Residents in Urban Areas     1       -0.00427        0.05139      -0.08      0.9342

Table 7.5 and fig. 7.8-7.9, p. 193-194. The where statement omits Alaska from the analysis.
symbol v=dot h=.8 c=blue;
proc reg data = p189;
  where State ~= 'AK';
  var region;
  model y = x1 x2 x3;
  output out=outr r=resid ;
  plot student.*p. student.*region;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: Y Expenditure

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     3          56944          18981      14.80    <.0001
Error                    45          57700     1282.21687
Corrected Total          48         114644

Root MSE             35.80806    R-Square     0.4967
Dependent Mean      279.26531    Adj R-Sq     0.4631
Coeff Var            12.82224
                                      Parameter Estimates

                                                 Parameter       Standard
Variable     Label                       DF       Estimate          Error    t Value    Pr > |t|
Intercept    Intercept                    1     -277.57731      132.42286      -2.10      0.0417
X1           Income                       1        0.04829        0.01215       3.98      0.0003
X2           Residents under 18           1        0.88693        0.33114       2.68      0.0103
X3           Residents in Urban Areas     1        0.06679        0.04934       1.35      0.1826

By looking at the plot of standardized residuals against region we can observe that the variance for each region is different (the points for each category of region have a different range). In order to fix this problem we create by the appropriate weights and run the regression model again the resulting residuals should have the same variance.

Generating the weights to be used in the WLS, p. 190.

proc sql; 
  create table outresid as
  select *, mean(resid*resid) as ratio
  from outr
  group by region;
quit;
proc sql; 
  create table outresid1 as
  select *, 1/sqrt(ratio/mean(ratio)) as c    
  from outresid;
quit;
proc sql; 
  create table weighted as
  select *, y*c as ty,
         x1*c as tx1, c*x2 as tx2, c*x3 as tx3     
  from outresid1;
quit; 
Table 7.6, p. 195.
proc sql;
 select distinct region, count(region) as n, ratio as sigma2, 1/c as cj
 from weighted
 group by region;
quit;
  Region         n    sigma2        cj  
----------------------------------------
       1         9  1451.113  1.110099
       2        12  2436.978  1.438589
       3        16  249.4333  0.460244
       4        12  950.4233    0.8984

The WLS part of table 7.7,  p. 195 and fig. 7.8-7.9, p. 196.
symbol v=dot h=.8 c=blue;
proc reg data = weighted;
  var region;
  model ty = c tx1 tx2 tx3/ noint;
  plot student.*p. student.*region;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: ty
NOTE: No intercept in model. R-Square is redefined.
                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     4        7186453        1796613    1570.69    <.0001
Error                    45          51473     1143.83478
Uncorrected Total        49        7237926

Root MSE             33.82063    R-Square     0.9929
Dependent Mean      355.52063    Adj R-Sq     0.9923
Coeff Var             9.51299
                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
c             1     -315.51704       78.17764      -4.04      0.0002
tx1           1        0.06227        0.00787       7.92      <.0001
tx2           1        0.87426        0.20026       4.37      <.0001
tx3           1        0.02936        0.03425       0.86      0.3958


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California