### Stata Textbook Examples Regression with Graphics by Lawrence Hamilton Chapter 8: Principal Components and Factor Analysis

Table 8.1, page 253.
use http://www.ats.ucla.edu/stat/stata/examples/rwg/basins, clear
(Hicks et al. (1990))

gen logro=log10(runoff)
gen logpre=log10(precip)
gen logglac=log10(glacier+1)
gen logarea=log10(area)
egen zlogro=std(logro)
egen zlogpre=std(logpre)
egen zlogglac=std(logglac)
egen zlogarea=std(logarea)
list basin zlogro zlogpre zlogglac zlogarea

basin     zlogro    zlogpre   zlogglac   zlogarea
1.               Ivory    1.30585   1.253279   1.974755   -1.74394
2.               Cropp   1.321216   1.467827   .1666327  -1.612833
3. Upper Waitangitoana   .8610425   1.030452  -.6078176  -.2809038
4.            Hokitika   1.093791   1.374215  -.1191939   .3364316
5.               Haast   .7470333   .8726353  -.1191939   .7685862
6. Little Hopwood Burn  -.5045639  -.6635709  -.6078176  -.8944885
7.            Shotover  -.7667871  -1.033303  -.1191939   .7948009
8.               Arrow  -1.598691  -1.542749  -.6078176   .1047718
9.         Manuherikia  -2.604667  -1.925677  -.6078176    .356689
10.             Karamea   .1025019  -.0490354  -.6078176   .8208289
11.            Buller A   -.373943  -.4820166  -.6078176   1.376861
12.            Buller B  -.1806241  -.3731881  -.6078176   1.511363
13.         Inangahua A  -.3277019  -.4265141  -.6078176    .170581
14.         Inangahua B  -.0883481  -.1786223  -.6078176   .7605425
15.                Grey  -.0657941  -.1786223  -.1191939   .5805332
16.      Butchers Creek  -.0807653  -.2247163  -.6078176   -1.48221
17.             Cleddau   .8228721    .973395  -.1191939   .0032735
18.              Hooker    .812043   .8726353   2.135663  -.1627341
19.     Tsidjiore Nouve  -.4744647  -.7664243   2.397095  -1.408153  
Figure 8.2, page 254.
graph twoway scatter zlogro zlogpre, xlabel(0) ylabel(0) xline(0) yline(0)
Table 8.2, page 254.
corr zlogro zlogpre zlogglac zlogarea
(obs=19)
|   zlogro  zlogpre zlogglac zlogarea
-------------+------------------------------------
zlogro |   1.0000
zlogpre |   0.9738   1.0000
zlogglac |   0.3385   0.3025   1.0000
zlogarea |  -0.2872  -0.2829  -0.5121   1.0000
Figure 8.3, page 254.
graph matrix zlogro zlogpre zlogglac zlogarea,  half
Table 8.3, page 255.
factor zlogro zlogpre zlogglac zlogarea, pcf
(obs=19)
(principal component factors; 2 factors retained)
Factor     Eigenvalue     Difference    Proportion    Cumulative
------------------------------------------------------------------
1        2.39152         1.29575      0.5979         0.5979
2        1.09578         0.60839      0.2739         0.8718
3        0.48739         0.46207      0.1218         0.9937
4        0.02532               .      0.0063         1.0000

Variable |      1          2    Uniqueness
-------------+--------------------------------
zlogro |   0.90586    0.40821    0.01278
zlogpre |   0.89510    0.43040    0.01356
zlogglac |   0.63697   -0.58522    0.25178
zlogarea |  -0.60333    0.63357    0.23458
Figure 8.4, page 258.
greigen, yline(1)
Table 8.4, page 259.
rotate

(varimax rotation)
Variable |      1          2    Uniqueness
-------------+--------------------------------
zlogro |   0.16443    0.97989    0.01278
zlogpre |   0.14001    0.98328    0.01356
zlogglac |   0.84060    0.20399    0.25178
zlogarea |  -0.86208   -0.14915    0.23458
Table 8.5, page 262.  Obliquely rotated loadings for mountain basin factors (compare with Tables 8.3 and 8.4).
rotate, promax

(promax rotation)
Variable |      1          2    Uniqueness
-------------+--------------------------------
zlogro |   0.01768    0.98744    0.01278
zlogpre |  -0.00854    0.99607    0.01356
zlogglac |   0.85152    0.03752    0.25178
zlogarea |  -0.88279    0.02413    0.23458
Table 8.6, page 264.
score

(based on rotated factors)

Scoring Coefficients
Variable |      1          2
-------------+---------------------
zlogro |   0.00522    0.50140
zlogpre |  -0.01226    0.50595
zlogglac |   0.56570    0.01343
zlogarea |  -0.58689    0.01810
Table 8.7, page 265.
score f1 f2

(based on rotated factors)
Scoring Coefficients
Variable |      1          2
-------------+---------------------
zlogro |   0.00522    0.50140
zlogpre |  -0.01226    0.50595
zlogglac |   0.56570    0.01343
zlogarea |  -0.58689    0.01810
Table 8.8, page 266.
gen logsed=log10(yield)
regress logsed zlogro zlogpre zlogglac zlogarea

Source |       SS       df       MS              Number of obs =      19
-------------+------------------------------           F(  4,    14) =   13.77
Model |   10.265554     4   2.5663885           Prob > F      =  0.0001
Residual |  2.60862507    14  .186330362           R-squared     =  0.7974
Total |  12.8741791    18  .715232171           Root MSE      =  .43166

------------------------------------------------------------------------------
logsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
zlogro |  -.1151779   .4575377    -0.25   0.805    -1.096499    .8661428
zlogpre |   .7751374   .4527634     1.71   0.109    -.1959435    1.746218
zlogglac |   .1315507   .1231924     1.07   0.304    -.1326706    .3957721
zlogarea |  -.1008783   .1200595    -0.84   0.415    -.3583802    .1566236
_cons |   3.200158   .0990296    32.32   0.000      2.98776    3.412555
------------------------------------------------------------------------------

regress logsed f1 f2

Source |       SS       df       MS              Number of obs =      19
-------------+------------------------------           F(  2,    16) =   28.91
Model |  10.0835282     2  5.04176408           Prob > F      =  0.0000
Residual |  2.79065092    16  .174415683           R-squared     =  0.7832
Total |  12.8741791    18  .715232171           Root MSE      =  .41763

------------------------------------------------------------------------------
logsed |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
f1 |    .192415   .1046676     1.84   0.085    -.0294705    .4143005
f2 |   .6608587   .1046676     6.31   0.000     .4389732    .8827442
_cons |   3.200158   .0958111    33.40   0.000     2.997047    3.403268
------------------------------------------------------------------------------
Figure 8.9, page 268.
graph twoway scatter f2 f1, ///
mlabel(location) msymbol(i) xlabel(-1(1)2) ylabel(-2(1)2) yline(0) xline(0)
Table 8.9, page 268.
use http://www.ats.ucla.edu/stat/stata/examples/rwg/planets, clear
(Beatty et al. (1981))

list planet dsun radius masskg density moons rings, nodis

planet       dsun     radius     masskg    density     moons     rings
1.  Mercury       57.9       2439   3.30e+23       5.42         0      none
2.    Venus      108.2       6050   4.87e+24       5.25         0      none
3.    Earth      149.6       6378   5.98e+24       5.52         1      none
4.     Mars      227.9       3398   6.42e+23       3.94         2      none
5.  Jupiter      778.3      71900   1.90e+27      1.314        16     rings
6.   Saturn       1427      60000   5.69e+26        .69        17     rings
7.   Uranus     2869.6      26145   8.66e+25       1.19        15     rings
8.  Neptune     4496.6      24750   1.03e+26       1.66         8     rings
9.    Pluto       5900       1550   1.10e+22        1.2         1      none  
Figure 8.10, page 269.
gen logdsun=log(dsun)
gen logmass=log(masskg)
gen logden=log(density)
gen logmoon=log(moons+1)
graph matrix logdsun lograd logmass logden logmoon rings, half 
Figure 8.11, page 270.
factor logdsun lograd logmass logden logmoon rings, pcf factor(2)
(obs=9)
(principal component factors; 2 factors retained)
Factor     Eigenvalue     Difference    Proportion    Cumulative
------------------------------------------------------------------
1        4.62365         3.45469      0.7706         0.7706
2        1.16896         1.05664      0.1948         0.9654
3        0.11232         0.05395      0.0187         0.9842
4        0.05837         0.02174      0.0097         0.9939
5        0.03663         0.03657      0.0061         1.0000
6        0.00006               .      0.0000         1.0000

Variable |      1          2    Uniqueness
-------------+--------------------------------
logdsun |   0.67105   -0.71093    0.04427
logmass |   0.83377    0.54463    0.00821
logden |  -0.84511    0.47053    0.06439
logmoon |   0.97647    0.00028    0.04651
rings |   0.97917    0.07720    0.03526

greigen, yline(1) xlabel(1(1)6) ylabel(0(1)4)
Table 8.10, page 270.  We have skipped this for now.
Figure 8.12, page 271.  We have skipped this for now.
Table 8.12, page 274.
use http://www.ats.ucla.edu/stat/stata/examples/rwg/tulsa, clear
(Blocker & Eckberg (1989))

corr deepwell chandler tornados floods airpol rivpol
(obs=199)
| deepwell chandler tornados   floods   airpol   rivpol
-------------+------------------------------------------------------
deepwell |   1.0000
chandler |   0.4726   1.0000
floods |   0.0928   0.0480   0.4052   1.0000
airpol |   0.2805   0.1661   0.1524   0.0712   1.0000
rivpol |   0.3365   0.2587   0.1007   0.1511   0.3861   1.0000
Figure 8.13, page 274.
graph matrix deepwell chandler tornados floods airpol rivpol, half jitter(5)
NOTE: This graph looks slightly different than the graph in the book because of the jittering. Jittering adds a small random number to each value graphed, so each time the graph is made, the small random addition to the points will make the graph look slightly different.
Figure 8.14, page 275.
factor deepwell chandler tornados floods airpol rivpol
(obs=199)
(principal factors; 3 factors retained)
Factor     Eigenvalue     Difference    Proportion    Cumulative
------------------------------------------------------------------
1        1.35194         0.88877      0.9929         0.9929
2        0.46317         0.29830      0.3402         1.3331
3        0.16487         0.28236      0.1211         1.4542
4       -0.11749         0.07032     -0.0863         1.3679
5       -0.18781         0.12528     -0.1379         1.2299
6       -0.31309               .     -0.2299         1.0000

Variable |      1          2          3    Uniqueness
-------------+-------------------------------------------
deepwell |   0.59374   -0.20135   -0.11690    0.59327
chandler |   0.53406   -0.13881   -0.23460    0.64047
tornados |   0.36576    0.42706   -0.05948    0.68031
floods |   0.29282    0.45047    0.02034    0.71092
airpol |   0.46081   -0.08334    0.23495    0.72551
rivpol |   0.53134   -0.10543    0.19240    0.66954

greigen, yline(0, 1) xlabel(1(1)6) ylabel(0 1) ytick(-.2 0 .2 .4 .6 .8 1 1.2 1.4)
Table 8.13, page 276.
factor deepwell chandler tornados floods airpol rivpol
(obs=199)
(principal factors; 3 factors retained)
Factor     Eigenvalue     Difference    Proportion    Cumulative
------------------------------------------------------------------
1        1.35194         0.88877      0.9929         0.9929
2        0.46317         0.29830      0.3402         1.3331
3        0.16487         0.28236      0.1211         1.4542
4       -0.11749         0.07032     -0.0863         1.3679
5       -0.18781         0.12528     -0.1379         1.2299
6       -0.31309               .     -0.2299         1.0000

Variable |      1          2          3    Uniqueness
-------------+-------------------------------------------
deepwell |   0.59374   -0.20135   -0.11690    0.59327
chandler |   0.53406   -0.13881   -0.23460    0.64047
tornados |   0.36576    0.42706   -0.05948    0.68031
floods |   0.29282    0.45047    0.02034    0.71092
airpol |   0.46081   -0.08334    0.23495    0.72551
rivpol |   0.53134   -0.10543    0.19240    0.66954

rotate, promax
(promax rotation)
Variable |      1          2          3    Uniqueness
-------------+-------------------------------------------
deepwell |   0.54661   -0.02946    0.14547    0.59327
chandler |   0.61054    0.03497   -0.03720    0.64047
tornados |   0.06995    0.54955   -0.02505    0.68031
floods |  -0.06653    0.54282    0.03733    0.71092
airpol |   0.04270    0.00781    0.49339    0.72551
rivpol |   0.13743    0.01013    0.47524    0.66954

score f1 f2 f3
(based on rotated factors)
Scoring Coefficients
Variable |      1          2          3
-------------+--------------------------------
deepwell |   0.36499    0.04240    0.21836
chandler |   0.34685    0.06944    0.10044
floods |   0.01167    0.35768    0.06601
airpol |   0.11423    0.06009    0.30260
rivpol |   0.17054    0.07305    0.33196

corr f1 f2 f3
(obs=199)
|       f1       f2       f3
-------------+---------------------------
f1 |   1.0000
f2 |   0.4957   1.0000
f3 |   0.8732   0.5503   1.0000
Figure 8.16, page 277.
histogram f3 if sex==0, ///
fraction bin(8) start(-2) xlabel(-2(1)1) ylabel(0 .1 .2) xline(-.166)
histogram f3 if sex==1, ///
fraction bin(8) start(-2) xlabel(-2(1)1) ylabel(0 .1 .2) xline(.128)
Table 8.15, page 279.
factor taxbabes manykids lessenvt toocons pollburd privown shutdown punish preserve, ml factor(3)
(obs=241)
Iteration 0:   log likelihood = -25.277324
Iteration 1:   log likelihood =  -10.89083
Iteration 2:   log likelihood = -10.463314
Iteration 3:   log likelihood = -10.376172
Iteration 4:   log likelihood = -10.356368
Iteration 5:   log likelihood =  -10.35112
Iteration 6:   log likelihood = -10.349526
Iteration 7:   log likelihood = -10.349001
Iteration 8:   log likelihood = -10.348822
Iteration 9:   log likelihood = -10.348759
Iteration 10:  log likelihood = -10.348737
Iteration 11:  log likelihood =  -10.34873
Iteration 12:  log likelihood = -10.348727
Iteration 13:  log likelihood = -10.348726
Iteration 14:  log likelihood = -10.348726
Iteration 15:  log likelihood = -10.348726
Iteration 16:  log likelihood = -10.348726
Iteration 17:  log likelihood = -10.348726
Iteration 18:  log likelihood = -10.348726
Iteration 19:  log likelihood = -10.348726

(maximum likelihood factors; 3 factors retained)
Factor     Variance       Difference    Proportion    Cumulative
------------------------------------------------------------------
1        1.09150        -0.42946      0.3398         0.3398
2        1.52096         0.92086      0.4734         0.8132
3        0.60010               .      0.1868         1.0000

Test:  3 vs. no   factors.  Chi2(  27) =  223.59, Prob > chi2 =  0.0000
Test:  3 vs. more factors.  Chi2(  12) =   20.20, Prob > chi2 =  0.0635

Variable |      1          2          3    Uniqueness
-------------+-------------------------------------------
taxbabes |   0.94460    0.00065   -0.01496    0.10743
manykids |  -0.33953    0.05576    0.15998    0.85602
lessenvt |   0.13347    0.51309    0.12787    0.70257
toocons |   0.05995    0.48555    0.42403    0.58085
pollburd |   0.02056    0.64636    0.17909    0.54973
privown |   0.01190    0.36004    0.01118    0.87010
shutdown |   0.00387   -0.49947    0.48407    0.51620
punish |   0.23087   -0.38692    0.32437    0.69177
preserve |   0.09313   -0.26878    0.07998    0.91269

rotate, promax
(promax rotation)
Variable |      1          2          3    Uniqueness
-------------+-------------------------------------------
taxbabes |   0.94363    0.04826    0.00168    0.10743
manykids |  -0.36004    0.15240    0.12783    0.85602
lessenvt |   0.12017    0.47899   -0.11160    0.70257
toocons |   0.00568    0.70413    0.19670    0.58085
pollburd |   0.00180    0.60911   -0.12521    0.54973
privown |   0.01370    0.26475   -0.15846    0.87010
shutdown |  -0.06779    0.05460    0.72057    0.51620
punish |   0.18163    0.01425    0.51159    0.69177
preserve |   0.07924   -0.11676    0.20860    0.91269
Figure 8.17, page 280.
graph twoway (scatter nchldn f1, jitter(3)) (lfit nchldn f1), ylabel(0(1)7) xlabel(-3(1)1)
NOTE: Because of the jittering, this graph does not look exactly like the one in the book.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.