UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

Stata FAQ
How can I generate fungible regression weights?

The goal in ordinary least squares (OLS) regression is to find the set of regression weight that minimizes the residual sum of squares. There is one, and only one, set of regression weights which minimizes the RSS. At the same time that the RSS is minimized the squared multiple correlation (R2) is maximized. Instead of finding the weights that maximize R2, we compute weights that will yield R2 - .005, a value very close to R2. According to Waller (2008) there are an infinite number of sets of weights that yield the reduced R2, when there are three or more predictor variables. All of these sets of weights are interchangeable, that is, they are fungible to the degree that they all generate the same reduced R2. The program regfungible will compute sets of weights for any degree of reduction in R2 desired.

We will demonstrate regfungible using the hsbdemo dataset. We begin by loading the data and then running a regression model with three predictors.

use http://www.ats.ucla.edu/stat/data/hsbdemo, clear

regress write read math science

      Source |       SS       df       MS              Number of obs =     200
-------------+------------------------------           F(  3,   196) =   57.30
       Model |  8353.98999     3  2784.66333           Prob > F      =  0.0000
    Residual |  9524.88501   196  48.5963521           R-squared     =  0.4673
-------------+------------------------------           Adj R-squared =  0.4591
       Total |   17878.875   199   89.843593           Root MSE      =  6.9711

------------------------------------------------------------------------------
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |   .2356606   .0691053     3.41   0.001     .0993751    .3719461
        math |   .3194791   .0756752     4.22   0.000     .1702369    .4687213
     science |   .2016571   .0690962     2.92   0.004     .0653896    .3379246
       _cons |   13.19155   3.068867     4.30   0.000     7.139308    19.24378
------------------------------------------------------------------------------  
The R2 for this model is .4673. We want to obtain sets of standardized regression weights for an R2 that is .005 less. The original R2 will be called RSQb, the new reduced R2 is RSQa and the difference between them is theta. Thus,
 theta = RSQb - RSQa = .4673 - .005 = .4623
Here is the regfungible command for generating 200 sets of weights.
regfungible, sets(200) theta(.005)

 OLS fungible regression weights analysis 
 
 Original R2: RSQb =  .4672548 
 Reduced R2:  RSQa =  .4622548 
 theta = RSQb-RSQa =      .005 
 r_yhata_yhatb     =  .9946352 


Generating Alternate weights ...
 

 Standardized OLS regression weights 
                 1             2             3
    +-------------------------------------------+
  1 |  .2549128629   .3157668631   .2106416581  |
    +-------------------------------------------+

 Maximum fungible regression weights for each variable 
                 1             2             3
    +-------------------------------------------+
  1 |  .3495330891   .2560737801   .1675002152  |
  2 |   .197326296   .4079093616   .1623860478  |
  3 |  .2075297825   .2678570952    .303304878  |
    +-------------------------------------------+

 Minimum fungible regression weights for each variable 
                 1             2             3
    +-------------------------------------------+
  1 |  .1548256496   .3681889758   .2498423376  |
  2 |  .3069399276   .2168678744    .254496415  |
  3 |  .2995772796   .3542006637   .1135491744  |
    +-------------------------------------------+

Summary of fungible regresson weights

   stats |       v_1       v_2       v_3
---------+------------------------------
       N |       200       200       200
    mean |  .2519706   .311021  .2100915
      p5 |  .1566942  .2185254  .1142415
     p25 |  .1886851  .2372786  .1403297
     p50 |  .2527676  .3169925  .2147582
     p75 |  .3173331  .3821019  .2787131
     p95 |  .3468746  .4054128  .3011541
----------------------------------------
The output above shows standardized regression weights from the original model (.2549128629, .3157668631, .2106416581). Along with a summary of the new fungible weights which were added to our data. These new variables are labeled by default v_1 through v_3. The prefix for these new variables can be changed using the prefix option in the program.

Looking at the "Summary of fungible regression weights" in the output we see the average, min, max and quartiles for the 200 fungible weights. It is often more interesting to look at the maximum and minimum weights for each of the variables. For example, the maximum value of v_1 is .3495532936 and is associated with weights .257299223 and .1661526516 for v_2 and v_3 respectively. These weights are rather different from the original weights. And, if we look at the maximum for v_2 (.4079123235) with associated v_1 and v_3 (.1970563447, .1626652555 ) we see that these weights can be very different from each other.

Next we will demonstrate that these weights generate R2's equal to RSQa. We will select the weights for a case at random, say case 155. Note, the values will differ from run to run unless you use the seed option.

/* generate standardized predictors */
egen zr = std(read)
egen zm = std(math)
egen zs = std(science)

/* get fungible weights for observation 155 */
list v_1 v_2 v_3 in 155

     +--------------------------------+
     |      v_1        v_2        v_3 |
     |--------------------------------|
155. | .1799844   .3003776   .2969216 |
     +--------------------------------+

/* generate predicted value, yhata */
generate yhata = .1799844*zr + .3003776*zm + .2969216*zs

/* correlate observed and predicted */
corr write yhata

(obs=200)

             |    write    yhata
-------------+------------------
       write |   1.0000
       yhata |   0.6799   1.0000

display r(rho)^2

.46225479
Next, we will generate some graphs from the results of regfungible beginning with a box plot of the regression weights for each variable. Note the considerable variation in the regression weights as well as the considerable overlap in values.
graph box v_*, scheme(lean1)

Let's look at the scatter plots of the fungible weights generated by the program for each pair of variates. We will use the graph matrix command for this.
graph matrix v_*, scheme(lean1)

We will follow the scatterplot matrix with a look at each of the univariate kernal density distributions.
forvalues i=1/3 {
  kdensity v_`i', name(v_`i') scheme(lean1)
}



We will finish up by generating weights for two additional values of theta (.01 and .02) and plotting all three sets of the first two variates on the same axes. Additionally, we will add a marker for the actual values of the standardized regression weights for read and math.
regfungible, sets(200) theta(.01) prefix(w_)
regfungible, sets(200) theta(.02) prefix(x_)

twoway (scatter v_1 v_2)(scatter w_1 w_2, msym(oh))(scatter x_1 x_2, msym(oh)), ///
       text(.2549128629 .3157668631 "+", place(c)) legend(off) scheme(lean1)
       
We end up with something that looks like a model of the solar system. You can see that as theta gets smaller and smaller the values of the fungible weights converges on the least squares regression weights. The gaps in the "orbits" would be filled in if we generated a greater number of sets of weights.

Reference

Waller, N.G. (2008). Fungible weights in multiple regression. Psychometrica, 73, 691-703.


How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.