|
|
|
||||
|
|
|||||
Example 1. A researcher has collected data on three psychological variables, four academic variables (standardized test scores) and gender for 600 college freshman. She is interested in how the set of psychological variables relate to the academic variables and gender. In particular, the researcher is interested in how many dimensions are necessary to understand the association between the two sets of variables.
Example 2. A doctor has collected data on cholesterol, blood pressure and weight. She also collected data on the eating habits of the subjects (e.g., how many ounces of red meat, fish, dairy products and chocolate consumed per week). She wants to investigate the relationship between the three measures of health and eating habits.
Example 3. A researcher is interested in determining how healthy African Violet plants are. She collects data on the average leaf diameter, the mass of the root ball and the average diameter of the blooms, as well as how long the plant has been in the current container. For predictor variables, she measures several elements in the soil, in addition to the amount of light and water each plant receives.
We have a data file, mmreg.dta, with 600 observations on eight variables. The psychological variables are locus of control, self-concept and motivation. The academic variables are standardized tests in reading, writing, math and science. Additionally, the variable female is a zero-one indicator variable with the one indicating a female student.
Let's look at the data (note that there are no missing data in this data set).
use http://www.ats.ucla.edu/stat/stata/dae/mmreg, clear
summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
id | 600 300.5 173.3494 1 600
locus_of_c~l | 600 .0965333 .6702799 -2.23 1.36
self_concept | 600 .0049167 .7055125 -2.62 1.19
motivation | 600 .6608333 .3427294 0 1
read | 600 51.90183 10.10298 28.3 76
-------------+--------------------------------------------------------
write | 600 52.38483 9.726455 25.5 67.1
math | 600 51.849 9.414736 31.8 75.5
science | 600 51.76333 9.706179 26 74.2
female | 600 .545 .4983864 0 1
tabulate female
female | Freq. Percent Cum.
------------+-----------------------------------
0 | 273 45.50 45.50
1 | 327 54.50 100.00
------------+-----------------------------------
Total | 600 100.00
correlate locus_of_control self_concept motivation
(obs=600)
| locus_~l self_c~t motiva~n
-------------+---------------------------
locus_of_c~l | 1.0000
self_concept | 0.1712 1.0000
motivation | 0.2451 0.2886 1.0000
correlate read write science female
(obs=600)
| read write science female
-------------+------------------------------------
read | 1.0000
write | 0.6286 1.0000
science | 0.6907 0.5691 1.0000
female | -0.0417 0.2443 -0.1382 1.0000
Technically speaking, we will be conducting a multivariate multiple regression. This regression is "multivariate" because there is more than one outcome variable. It is a "multiple" regression because there is more than one predictor variable. Of course, you can conduct a multivariate regression with only one predictor variable, although that is rare in practice.
To conduct a multivariate regression in Stata, you need to use two commands, manova and mvreg. The manova command will indicate if all of the equations, taken together, are statistically significant. The f- and p-values for four multivariate criterion are given, including Wilks' lambda, Lawley-Hotelling trace, Pillai's trace, and Roy's largest root. Next, we use the mvreg command to obtain the coefficients, standard errors, etc., for each of the predictors in each model. We will also show the use of the test command after the mvreg command. The use of the test command is one of the compelling reasons for conducting a multivariate regression analysis.
manova locus_of_control self_concept motivation = read write science female, continuous(read write science female)
Number of obs = 600
W = Wilks' lambda L = Lawley-Hotelling trace
P = Pillai's trace R = Roy's largest root
Source | Statistic df F(df1, df2) = F Prob>F
-----------+--------------------------------------------------
Model | W 0.7587 4 12.0 1569.2 14.39 0.0000 a
| P 0.2497 12.0 1785.0 13.51 0.0000 a
| L 0.3070 12.0 1775.0 15.14 0.0000 a
| R 0.2673 4.0 595.0 39.76 0.0000 u
|--------------------------------------------------
Residual | 595
-----------+--------------------------------------------------
read | W 0.9687 1 3.0 593.0 6.38 0.0003 e
| P 0.0313 3.0 593.0 6.38 0.0003 e
| L 0.0323 3.0 593.0 6.38 0.0003 e
| R 0.0323 3.0 593.0 6.38 0.0003 e
|--------------------------------------------------
write | W 0.9710 1 3.0 593.0 5.90 0.0006 e
| P 0.0290 3.0 593.0 5.90 0.0006 e
| L 0.0299 3.0 593.0 5.90 0.0006 e
| R 0.0299 3.0 593.0 5.90 0.0006 e
|--------------------------------------------------
science | W 0.9846 1 3.0 593.0 3.10 0.0264 e
| P 0.0154 3.0 593.0 3.10 0.0264 e
| L 0.0157 3.0 593.0 3.10 0.0264 e
| R 0.0157 3.0 593.0 3.10 0.0264 e
|--------------------------------------------------
female | W 0.9672 1 3.0 593.0 6.69 0.0002 e
| P 0.0328 3.0 593.0 6.69 0.0002 e
| L 0.0339 3.0 593.0 6.69 0.0002 e
| R 0.0339 3.0 593.0 6.69 0.0002 e
|--------------------------------------------------
Residual | 595
-----------+--------------------------------------------------
Total | 599
--------------------------------------------------------------
e = exact, a = approximate, u = upper bound on F
As we can see from the output above in the section called Model (under Source), our model is statistically significant, regardless of the type of multivariate criteria that is used (all of the p-values are less than 0.0000). If the overall model was not statistically significant, you might want to modify it before running the mvreg (because coefficients for a non-significant model are usually uninteresting). In the lower part of the output we see the multivariate tests for each of the predictor variables. Each of these is statistically significant.
mvreg locus_of_control self_concept motivation = read write science female
Equation Obs Parms RMSE "R-sq" F P
----------------------------------------------------------------------
locus_of_c~l 600 5 .6100045 0.1773 32.0561 0.0000
self_concept 600 5 .7009231 0.0196 2.967439 0.0191
motivation 600 5 .3304092 0.0768 12.37582 0.0000
------------------------------------------------------------------------------
| Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
locus_of_c~l |
read | .0142859 .003733 3.83 0.000 .0069544 .0216173
write | .0090845 .0037045 2.45 0.014 .0018091 .0163599
science | .007979 .0037862 2.11 0.036 .0005431 .015415
female | .1427696 .055268 2.58 0.010 .0342256 .2513137
_cons | -1.611649 .1562346 -10.32 0.000 -1.918487 -1.304811
-------------+----------------------------------------------------------------
self_concept |
read | .0019048 .0042894 0.44 0.657 -.0065194 .010329
write | .0015355 .0042566 0.36 0.718 -.0068242 .0098953
science | .0015544 .0043505 0.36 0.721 -.0069899 .0100986
female | -.179823 .0635054 -2.83 0.005 -.3045451 -.0551009
_cons | -.1568385 .1795207 -0.87 0.383 -.5094098 .1957328
-------------+----------------------------------------------------------------
motivation |
read | .0051889 .002022 2.57 0.011 .0012178 .00916
write | .0072695 .0020065 3.62 0.000 .0033287 .0112102
science | -.003597 .0020508 -1.75 0.080 -.0076247 .0004307
female | .0275103 .0299359 0.92 0.358 -.0312826 .0863033
_cons | .1819084 .0846245 2.15 0.032 .0157093 .3481075
------------------------------------------------------------------------------
The output from the mvreg command looks very much like the output from the regress command, except that there are three equations (one for each outcome measure) instead of one. The coefficients (and all of the output) are interpreted in exactly the same way as they are for any OLS regression. To be clear, the "R-sq" in the output above corresponds to the R-squared from the regress command, not the adjusted R-squared.
If you ran a separate regression for each outcome variable, you would get exactly the same coefficients, standard errors, t- and p-values, and confidence intervals as shown above. So why conduct a multivariate regression? One of the advantages of using mvreg is that you can conduct tests of the coefficients across the different models. (Please note that many of these tests can be preformed after the manova command, although the process can be more difficult because a series of contrasts needs to be created.) In the examples below, we test four hypotheses.
test read
( 1) [locus_of_control]read = 0
( 2) [self_concept]read = 0
( 3) [motivation]read = 0
F( 3, 595) = 6.40
Prob > F = 0.0003
test [locus_of_control]read [locus_of_control]write
( 1) [locus_of_control]read = 0
( 2) [locus_of_control]write = 0
F( 2, 595) = 16.77
Prob > F = 0.0000
test [locus_of_control]female = [self_concept]female
( 1) [locus_of_control]female - [self_concept]female = 0
F( 1, 595) = 17.85
Prob > F = 0.0000
test [locus_of_control]science = [self_concept]science, accum
( 1) [locus_of_control]female - [self_concept]female = 0
( 2) [locus_of_control]science - [self_concept]science = 0
F( 2, 595) = 8.93
Prob > F = 0.0002
mvreg locus_of_control self_concept motivation = read write science female estimates store m1 estout m1, cells( b(star fmt(%9.4f)) se(par fmt(%9.4f)) ) unstack style(fixed)
m1
locus_of_c~l self_concept motivation
b/se b/se b/se
read 0.0143*** 0.0019 0.0052*
(0.0037) (0.0043) (0.0020)
write 0.0091* 0.0015 0.0073***
(0.0037) (0.0043) (0.0020)
science 0.0080* 0.0016 -0.0036
(0.0038) (0.0044) (0.0021)
female 0.1428* -0.1798** 0.0275
(0.0553) (0.0635) (0.0299)
_cons -1.6116*** -0.1568 0.1819*
(0.1562) (0.1795) (0.0846)
Another column could be added to this table (manually) that would show the multivariate tests for each variable from the manova output. Mention should be made of which equations are statistically significant, as well as which variables in each equation are statistically significant. In many cases, the write up of the results will likely focus on the hypotheses tested with the test command. For example, the test of the variable read indicates that its coefficient is not simultaneously equal to 0 in all three equations, even though this variable is not statistically significant in the second equation. Another hypothesis tested regards the effect of the variables female and science. Taken together, these variables are statistically significant predictors of locus of control and self concept (F (2, 595) = 8.93, p < .05).
In the analysis above, we conducted a multivariate multiple regression to see the effect of the variables, read, write, science and gender on three outcome measures: locus of control, self concept and motivation. Each of the models was statistically significant, and each of the predictors was statistically significant in at least one model. After conducting the multivariate multiple regression, we tested several hypotheses. The first was to determine if the variable read had an effect in all three equations simultaneously, which it did (F(3, 595) = 6.40, p < 0.05). We also tested the hypothesis that the variables read and write together had an effect on the outcome measure locus of control, which they did (F(2, 595) = 16.77, p < 0.05). Finally, we tested the hypothesis that both gender and science, taken together, had an effect in both the equation for locus of control and self concept, and this hypothesis was also supported by the data (F(2, 595) = 8.93, p < 0.05).
Stata Online Manual
References
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services