UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata FAQ
How can I check for collinearity in survey regression?

Collinearity is a property of predictor variables and can be checked using a series survey regressions (svy: regress) with just the predictors. We will illustrate this using the hsb2 dataset pretending that the variable socst is the sampling weight (pweight) and that the sample is stratified on ses. Let's say that read and write are two of the predictors. We will create a third predictor, rw, with high collinearity by multiplying read and write together.

Note: In OLS regrerssion the way that to obtain the VIF and tolerance is to use the estat vif after the regress command.

use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear

generate rw = read*write

svyset [pw=socst], strata(ses)

      pweight: socst
          VCE: linearized
     Strata 1: ses
         SU 1: 
        FPC 1: 

svy: regress read write rw
(running regress on estimation sample)

Survey: Linear regression

Number of strata   =         3                  Number of obs      =       200
Number of PSUs     =       200                  Population size    =     10481
                                                Design df          =       197
                                                F(   2,    196)    =   2732.78
                                                Prob > F           =    0.0000
                                                R-squared          =    0.9789

------------------------------------------------------------------------------
             |             Linearized
        read |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       write |   -.850208   .0265923   -31.97   0.000    -.9026501   -.7977658
          rw |   .0174374   .0002659    65.57   0.000     .0169129    .0179618
       _cons |    48.0699   .9949204    48.32   0.000     46.10784    50.03196
------------------------------------------------------------------------------

display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))

tolerance = .02105442 VIF = 47.495965

svy: regress write read rw
(running regress on estimation sample)

Survey: Linear regression

Number of strata   =         3                  Number of obs      =       200
Number of PSUs     =       200                  Population size    =     10481
                                                Design df          =       197
                                                F(   2,    196)    =   1795.43
                                                Prob > F           =    0.0000
                                                R-squared          =    0.9677

------------------------------------------------------------------------------
             |             Linearized
       write |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        read |  -1.026298   .0301498   -34.04   0.000    -1.085756   -.9668401
          rw |   .0189835    .000346    54.87   0.000     .0183012    .0196657
       _cons |   53.00732   .9279013    57.13   0.000     51.17742    54.83721
------------------------------------------------------------------------------

display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))

tolerance = .03233581 VIF = 30.925463

svy: regress rw write read
(running regress on estimation sample)

Survey: Linear regression

Number of strata   =         3                  Number of obs      =       200
Number of PSUs     =       200                  Population size    =     10481
                                                Design df          =       197
                                                F(   2,    196)    =   5429.51
                                                Prob > F           =    0.0000
                                                R-squared          =    0.9917

------------------------------------------------------------------------------
             |             Linearized
          rw |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       write |   50.02573   .9657679    51.80   0.000     48.12116     51.9303
        read |   55.46852   .8909744    62.26   0.000     53.71145    57.22559
       _cons |    -2724.5   55.57823   -49.02   0.000    -2834.104   -2614.895
------------------------------------------------------------------------------

display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))

tolerance = .00831673 VIF = 120.23951
Note that we used each of the predictor variables, in turn, as the response variable for a survey regression. Tolerance is defined as 1-R2 and VIF as 1/tolerance.  VIF values greater than 10 may warrant further examination.  In this example, all of the VIFs were problematic but the variable rw stands out with a VIF of 120.24.  This same approach can be used with survey logit (svy: logit) or any of the survey estimation procedures.

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California