UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Analysis Tools
Weighted Least Squares Regression

Weighted least squares provides one method for dealing with heteroscedasticity. The wls0 command can be used to compute various WLS solutions. You can download wls0 over the internet by typing findit wls0 (see How can I use the findit command to search for programs and get additional help? for more information about using findit).
Let's use an example dataset that exhibits heteroscedasticity, hetdata.
use http://www.ats.ucla.edu/stat/stata/ado/analysis/hetdata

regress exp age ownrent income incomesq


      Source |       SS       df       MS              Number of obs =      72
-------------+------------------------------           F(  4,    67) =    5.39
       Model |  1749357.01     4  437339.252           Prob > F      =  0.0008
    Residual |  5432562.03    67  81083.0153           R-squared     =  0.2436
-------------+------------------------------           Adj R-squared =  0.1984
       Total |  7181919.03    71  101153.789           Root MSE      =  284.75

------------------------------------------------------------------------------
         exp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -3.081814   5.514717    -0.56   0.578    -14.08923    7.925606
     ownrent |   27.94091   82.92232     0.34   0.737    -137.5727    193.4546
      income |    234.347   80.36595     2.92   0.005     73.93593    394.7581
    incomesq |  -14.99684   7.469337    -2.01   0.049     -29.9057   -.0879859
       _cons |  -237.1465   199.3517    -1.19   0.238    -635.0541    160.7611
------------------------------------------------------------------------------

rvpplot income, ylab yline(0)

The residual versus income plot shows clear evidence of heteroscedasticity. Let's try a WLS weighting proportional to income. The WLS type, abse, uses the absolute value of the residuals and in this case no constant.
wls0 exp age ownrent income incomesq, wvar(income) type(abse) noconst graph

WLS regression -  type: proportional to abs(e)

(sum of wgt is   5.7161e-01)

      Source |       SS       df       MS              Number of obs =      72
-------------+------------------------------           F(  4,    67) =    5.73
       Model |  1266234.75     4  316558.686           Prob > F      =  0.0005
    Residual |  3703808.10    67  55280.7179           R-squared     =  0.2548
-------------+------------------------------           Adj R-squared =  0.2103
       Total |  4970042.85    71  70000.6035           Root MSE      =  235.12

------------------------------------------------------------------------------
         exp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -2.935011   4.603331    -0.64   0.526     -12.1233    6.253276
     ownrent |   50.49364   69.87914     0.72   0.472     -88.9857     189.973
      income |   202.1694   76.78152     2.63   0.010     48.91285     355.426
    incomesq |  -12.11364    8.27314    -1.46   0.148    -28.62689     4.39962
       _cons |  -181.8706   165.5191    -1.10   0.276    -512.2481    148.5068
------------------------------------------------------------------------------

The residual plot is better. We can try another possibilities, such as, weighting proportional to income and income squared.
wls0 exp age ownrent income incomesq, wvar(income incomesq) type(abse) noconst graph

WLS regression -  type: proportional to abs(e)

(sum of wgt is   4.3021e-01)

      Source |       SS       df       MS              Number of obs =      72
-------------+------------------------------           F(  4,    67) =    6.37
       Model |  1626419.82     4  406604.954           Prob > F      =  0.0002
    Residual |  4277725.74    67  63846.6528           R-squared     =  0.2755
-------------+------------------------------           Adj R-squared =  0.2322
       Total |  5904145.55    71  83156.9796           Root MSE      =  252.68

------------------------------------------------------------------------------
         exp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -3.038906   4.953024    -0.61   0.542    -12.92518    6.847371
     ownrent |   41.89772   75.32687     0.56   0.580    -108.4553    192.2508
      income |   214.7859   70.17436     3.06   0.003     74.71732    354.8545
    incomesq |  -13.41379   6.353738    -2.11   0.038    -26.09591   -.7316791
       _cons |  -199.6993   170.1115    -1.17   0.245    -539.2433    139.8448
------------------------------------------------------------------------------

Finally, let's try one more variation. This time we will make the adjustment proportional to the log of squared residuals.
wls0 exp age ownrent income incomesq, wvar(income incomesq) type(loge2) graph

WLS regression -  type: proportional to log(e^2) 

(sum of wgt is   2.8166e-02)

      Source |       SS       df       MS              Number of obs =      72
-------------+------------------------------           F(  4,    67) =   69.69
       Model |  2872576.02     4  718144.005           Prob > F      =  0.0000
    Residual |  690414.759    67  10304.6979           R-squared     =  0.8062
-------------+------------------------------           Adj R-squared =  0.7947
       Total |  3562990.78    71  50182.9687           Root MSE      =  101.51

------------------------------------------------------------------------------
         exp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |  -1.233683   2.551197    -0.48   0.630    -6.325894    3.858527
     ownrent |   50.94976   52.81429     0.96   0.338      -54.468    156.3675
      income |   145.3045    46.3627     3.13   0.003     52.76413    237.8448
    incomesq |   -7.93828   3.736716    -2.12   0.037     -15.3968   -.4797648
       _cons |  -117.8675   101.3862    -1.16   0.249    -320.2352    84.50027
------------------------------------------------------------------------------

In addition to weight types abse and loge2 there is squared residuals (e2) and squared fitted values (xb2).
Finding the optimal WLS solution to use involves detailed knowledge of your data and trying different combinations of variables and types of weighting.

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.