UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Regression with Stata
Chapter 2
Self Assessment

1. The following data set consists of measured weight, measured height, reported weight and reported height of some 200 people. You can get it from within Stata by typing use http://www.ats.ucla.edu/stat/stata/webbooks/reg/davis   We tried to build a model to predict measured weight by reported weight, reported height and measured height. We did an lvr2plot after the regression and here is what we have. Explain what you see in the graph and try to use other STATA commands to identify the problematic observation(s). What do you think the problem is and what is your solution?

use http://www.ats.ucla.edu/stat/stata/webbooks/reg/davis 
regress  measwt measht reptwt reptht 
  Source |       SS       df       MS                  Number of obs =     181
---------+------------------------------               F(  3,   177) = 1640.88
   Model |  40891.9594     3  13630.6531               Prob > F      =  0.0000
Residual |   1470.3279   177  8.30693727               R-squared     =  0.9653
---------+------------------------------               Adj R-squared =  0.9647
   Total |  42362.2873   180  235.346041               Root MSE      =  2.8822

------------------------------------------------------------------------------
  measwt |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  measht |  -.9607757   .0260189    -36.926   0.000      -1.012123   -.9094285
  reptwt |    1.01917   .0240778     42.328   0.000        .971654    1.066687
  reptht |   .8184156   .0419658     19.502   0.000       .7355979    .9012334
   _cons |    24.8138   4.888302      5.076   0.000       15.16695    34.46065
------------------------------------------------------------------------------
lvr2plot

2. Using the data from the last exercise, what measure would you use if you want to know how much change an observation would make on a coefficient for a predictor? For example, show how much change would it be for the coefficient of predictor reptht if we omit observation 12 from our regression analysis? What are the other measures that you would use to assess the influence of an observation on regression? What are the cut-off values for them?

3. The following data file is called bbwt.dta and it is from Weisberg's Applied Regression Analysis. You can obtain it from within Stata by typing use http://www.ats.ucla.edu/stat/stata/webbooks/reg/bbwt It consists of the body weights and brain weights of some 60 animals.  We want to predict the brain weight by body weight, that is, a simple linear regression of brain weight against body weight. Show what you have to do to verify the linearity assumption. If you think that it violates the linearity assumption, show some possible remedies that you would consider. 

use http://www.ats.ucla.edu/stat/stata/webbooks/reg/bbwt, clear
regress brainwt bodywt
  Source |       SS       df       MS                  Number of obs =      62
---------+------------------------------               F(  1,    60) =  411.12
   Model |  46067326.8     1  46067326.8               Prob > F      =  0.0000
Residual |  6723217.18    60   112053.62               R-squared     =  0.8726
---------+------------------------------               Adj R-squared =  0.8705
   Total |  52790543.9    61  865418.753               Root MSE      =  334.74

------------------------------------------------------------------------------
 brainwt |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
  bodywt |   .9664599   .0476651     20.276   0.000       .8711155    1.061804
   _cons |   91.00865   43.55574      2.089   0.041       3.884201    178.1331
------------------------------------------------------------------------------

4. We did a regression analysis using the data file elemapi2 in chapter 2. Continuing with the analysis we did, we did an avplot here.  Explain what an avplot is and what type of information you would get from the plot. If variable full were put in the model, would it be a significant predictor?

use http://www.ats.ucla.edu/stat/stata/webbooks/reg/elemapi2, clear
regress api00 meals ell emer
  Source |       SS       df       MS                  Number of obs =     400
---------+------------------------------               F(  3,   396) =  673.00
   Model |  6749782.75     3  2249927.58               Prob > F      =  0.0000
Residual |  1323889.25   396  3343.15467               R-squared     =  0.8360
---------+------------------------------               Adj R-squared =  0.8348
   Total |  8073672.00   399  20234.7669               Root MSE      =   57.82

------------------------------------------------------------------------------
   api00 |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
   meals |  -3.159189   .1497371    -21.098   0.000      -3.453568   -2.864809
     ell |  -.9098732   .1846442     -4.928   0.000      -1.272878   -.5468678
    emer |  -1.573496    .293112     -5.368   0.000      -2.149746   -.9972456
   _cons |   886.7033    6.25976    141.651   0.000       874.3967    899.0098
------------------------------------------------------------------------------
avplot full, mlabel(snum)
 

5. The data set wage.dta is from a national sample of 6000 households with a male head earning less than $15,000 annually in 1966. You can get this data file by typing use http://www.ats.ucla.edu/stat/stata/webbooks/reg/wage from within Stata. The data were classified into 39 demographic groups for analysis. We tried to predict the average hours worked by average age of respondent and average yearly non-earned income.

use http://www.ats.ucla.edu/stat/stata/webbooks/reg/wage, clear
regress HRS AGE NEIN
  Source |       SS       df       MS                  Number of obs =      39
---------+------------------------------               F(  2,    36) =   39.72
   Model |  107205.109     2  53602.5543               Prob > F      =  0.0000
Residual |  48578.1222    36  1349.39228               R-squared     =  0.6882
---------+------------------------------               Adj R-squared =  0.6708
   Total |  155783.231    38   4099.5587               Root MSE      =  36.734

------------------------------------------------------------------------------
     HRS |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     AGE |  -8.281632   1.603736     -5.164   0.000      -11.53416   -5.029104
    NEIN |   .4289202   .0484882      8.846   0.000       .3305816    .5272588
   _cons |    2321.03   57.55038     40.330   0.000       2204.312    2437.748
------------------------------------------------------------------------------

Both predictors are significant. Now if we add ASSET to our predictors list, neither NEIN nor ASSET is significant.

regress HRS AGE NEIN ASSET
  Source |       SS       df       MS                  Number of obs =      39
---------+------------------------------               F(  3,    35) =   25.83
   Model |   107317.64     3  35772.5467               Prob > F      =  0.0000
Residual |  48465.5908    35  1384.73117               R-squared     =  0.6889
---------+------------------------------               Adj R-squared =  0.6622
   Total |  155783.231    38   4099.5587               Root MSE      =  37.212

------------------------------------------------------------------------------
     HRS |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     AGE |  -8.007181    1.88844     -4.240   0.000      -11.84092   -4.173443
    NEIN |   .3338277    .337171      0.990   0.329      -.3506658    1.018321
   ASSET |   .0044232    .015516      0.285   0.777       -.027076    .0359223
   _cons |   2314.054   63.22636     36.600   0.000       2185.698    2442.411
------------------------------------------------------------------------------

Can you explain why?

6. Continue to use the previous data set. This time we want to predict the average hourly wage by average percent of white respondents. Carry out the regression analysis and list the STATA commands that you can use to check for heteroscedasticity. Explain the result of your test(s).

Now we want build another model to predict the average percent of white respondents by the average hours worked. Repeat the analysis you performed on the previous regression model. Explain your results.

7. We have a data set that consists of volume, diameter and height of some objects. Someone did a regression of volume on diameter and height.

use http://www.ats.ucla.edu/stat/stata/webbooks/reg/tree, clear
regress vol dia height
  Source |       SS       df       MS                  Number of obs =      31
---------+------------------------------               F(  2,    28) =  254.97
   Model |  7684.16254     2  3842.08127               Prob > F      =  0.0000
Residual |  421.921306    28  15.0686181               R-squared     =  0.9480
---------+------------------------------               Adj R-squared =  0.9442
   Total |  8106.08385    30  270.202795               Root MSE      =  3.8818

------------------------------------------------------------------------------
     vol |      Coef.   Std. Err.       t     P>|t|       [95% Conf. Interval]
---------+--------------------------------------------------------------------
     dia |   4.708161   .2642646     17.816   0.000       4.166839    5.249482
  height |   .3392513   .1301512      2.607   0.014       .0726487    .6058538
   _cons |  -57.98766   8.638225     -6.713   0.000      -75.68226   -40.29306
------------------------------------------------------------------------------

Explain what tests you can use to detect model specification errors and if there is any, your solution to correct it.


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.