UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Stata Data Analysis Examples
Multivariate Regression Analysis

Examples of Multivariate Regression Analysis

Example 1. A researcher has collected data on three psychological variables, four academic variables (standardized test scores) and gender for 600 college freshman. She is interested in how the set of psychological variables relate to the academic variables and gender. In particular, the researcher is interested in how many dimensions are necessary to understand the association between the two sets of variables.

Example 2. A doctor has collected data on cholesterol, blood pressure and weight.  She also collected data on the eating habits of the subjects (e.g., how many ounces of red meat, fish, dairy products and chocolate consumed per week).  She wants to investigate the relationship between the three measures of health and eating habits.

Example 3. A researcher is interested in determining how healthy African Violet plants are.  She collects data on the average leaf diameter, the mass of the root ball and the average diameter of the blooms, as well as how long the plant has been in the current container.  For predictor variables, she measures several elements in the soil, in addition to the amount of light and water each plant receives.

Description of the Data

Let's pursue Example 1 from above.

We have a data file, mmreg.dta, with 600 observations on eight variables. The psychological variables are locus of control, self-concept and motivation. The academic variables are standardized tests in reading, writing, math and science. Additionally, the variable female is a zero-one indicator variable with the one indicating a female student.

Let's look at the data (note that there are no missing data in this data set).

Some Strategies You Might Be Tempted To Try

Before we show how you can analyze this with a multivariate regression analysis, let's consider some other methods that you might use.

Multivariate Regression Analysis

Technically speaking, we will be conducting a multivariate multiple regression.  This regression is "multivariate" because there is more than one outcome variable.  It is a "multiple" regression because there is more than one predictor variable.  Of course, you can conduct a multivariate regression with only one predictor variable, although that is rare in practice.

To conduct a multivariate regression in Stata, you need to use two commands, manova and mvreg.  The manova command will indicate if all of the equations, taken together, are statistically significant.  The f- and p-values for four multivariate criterion are given, including Wilks' lambda, Lawley-Hotelling trace, Pillai's trace, and Roy's largest root.  Next, we use the mvreg command to obtain the coefficients, standard errors, etc., for each of the predictors in each model.  We will also show the use of the test command after the mvreg command.  The use of the test command is one of the compelling reasons for conducting a multivariate regression analysis.

manova locus_of_control self_concept motivation = read write science female, continuous(read write science female)
                           Number of obs =     600

                           W = Wilks' lambda      L = Lawley-Hotelling trace
                           P = Pillai's trace     R = Roy's largest root

                  Source |  Statistic     df   F(df1,    df2) =   F   Prob>F
              -----------+--------------------------------------------------
                   Model | W   0.7587      4    12.0  1569.2    14.39 0.0000 a
                         | P   0.2497           12.0  1785.0    13.51 0.0000 a
                         | L   0.3070           12.0  1775.0    15.14 0.0000 a
                         | R   0.2673            4.0   595.0    39.76 0.0000 u
                         |--------------------------------------------------
                Residual |               595
              -----------+--------------------------------------------------
                    read | W   0.9687      1     3.0   593.0     6.38 0.0003 e
                         | P   0.0313            3.0   593.0     6.38 0.0003 e
                         | L   0.0323            3.0   593.0     6.38 0.0003 e
                         | R   0.0323            3.0   593.0     6.38 0.0003 e
                         |--------------------------------------------------
                   write | W   0.9710      1     3.0   593.0     5.90 0.0006 e
                         | P   0.0290            3.0   593.0     5.90 0.0006 e
                         | L   0.0299            3.0   593.0     5.90 0.0006 e
                         | R   0.0299            3.0   593.0     5.90 0.0006 e
                         |--------------------------------------------------
                 science | W   0.9846      1     3.0   593.0     3.10 0.0264 e
                         | P   0.0154            3.0   593.0     3.10 0.0264 e
                         | L   0.0157            3.0   593.0     3.10 0.0264 e
                         | R   0.0157            3.0   593.0     3.10 0.0264 e
                         |--------------------------------------------------
                  female | W   0.9672      1     3.0   593.0     6.69 0.0002 e
                         | P   0.0328            3.0   593.0     6.69 0.0002 e
                         | L   0.0339            3.0   593.0     6.69 0.0002 e
                         | R   0.0339            3.0   593.0     6.69 0.0002 e
                         |--------------------------------------------------
                Residual |               595
              -----------+--------------------------------------------------
                   Total |               599
              --------------------------------------------------------------
                           e = exact, a = approximate, u = upper bound on F

As we can see from the output above in the section called Model (under Source), our model is statistically significant, regardless of the type of multivariate criteria that is used (all of the p-values are less than 0.0000). If the overall model was not statistically significant, you might want to modify it before running the mvreg (because coefficients for a non-significant model are usually uninteresting).  In the lower part of the output we see the multivariate tests for each of the predictor variables.  Each of these is statistically significant.

mvreg locus_of_control self_concept motivation = read write science female

Equation          Obs  Parms        RMSE    "R-sq"          F        P
----------------------------------------------------------------------
locus_of_c~l      600      5    .6100045    0.1773    32.0561   0.0000
self_concept      600      5    .7009231    0.0196   2.967439   0.0191
motivation        600      5    .3304092    0.0768   12.37582   0.0000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
locus_of_c~l |
        read |   .0142859    .003733     3.83   0.000     .0069544    .0216173
       write |   .0090845   .0037045     2.45   0.014     .0018091    .0163599
     science |    .007979   .0037862     2.11   0.036     .0005431     .015415
      female |   .1427696    .055268     2.58   0.010     .0342256    .2513137
       _cons |  -1.611649   .1562346   -10.32   0.000    -1.918487   -1.304811
-------------+----------------------------------------------------------------
self_concept |
        read |   .0019048   .0042894     0.44   0.657    -.0065194     .010329
       write |   .0015355   .0042566     0.36   0.718    -.0068242    .0098953
     science |   .0015544   .0043505     0.36   0.721    -.0069899    .0100986
      female |   -.179823   .0635054    -2.83   0.005    -.3045451   -.0551009
       _cons |  -.1568385   .1795207    -0.87   0.383    -.5094098    .1957328
-------------+----------------------------------------------------------------
motivation   |
        read |   .0051889    .002022     2.57   0.011     .0012178      .00916
       write |   .0072695   .0020065     3.62   0.000     .0033287    .0112102
     science |   -.003597   .0020508    -1.75   0.080    -.0076247    .0004307
      female |   .0275103   .0299359     0.92   0.358    -.0312826    .0863033
       _cons |   .1819084   .0846245     2.15   0.032     .0157093    .3481075
------------------------------------------------------------------------------

The output from the mvreg command looks very much like the output from the regress command, except that there are three equations (one for each outcome measure) instead of one.  The coefficients (and all of the output) are interpreted in exactly the same way as they are for any OLS regression.  To be clear, the "R-sq" in the output above corresponds to the R-squared from the regress command, not the adjusted R-squared.

If you ran a separate regression for each outcome variable, you would get exactly the same coefficients, standard errors, t- and p-values, and confidence intervals as shown above.  So why conduct a multivariate regression?  One of the advantages of using mvreg is that you can conduct tests of the coefficients across the different models.  (Please note that many of these tests can be preformed after the manova command, although the process can be more difficult because a series of contrasts needs to be created.)  In the examples below, we test four hypotheses.

  1. Test the null hypothesis that the coefficient for the variable read equals 0 in all three equations.  (Note that this mostly duplicates the test for the variable read in the manova output above.)
  2. Test the null hypothesis that the coefficients for the variables read and write equal 0 in the equation with the outcome variable locus_of_control.
  3. Test the null hypothesis that the coefficient for the variable female in the equation with the outcome variable locus_of_control equals the coefficient for female in the equations with the outcome variable self_concept.
  4. Test the null hypothesis that the coefficient for the variable female in the equation with the outcome variable locus_of_control equals the coefficient for female in the equations with the outcome variable self_concept, and that the variable science in the equation with the outcome variable locus_of_control equals the coefficient for science in the equations with the outcome variable self_concept
test read

 ( 1)  [locus_of_control]read = 0
 ( 2)  [self_concept]read = 0
 ( 3)  [motivation]read = 0

       F(  3,   595) =    6.40
            Prob > F =    0.0003

test [locus_of_control]read [locus_of_control]write

 ( 1)  [locus_of_control]read = 0
 ( 2)  [locus_of_control]write = 0

       F(  2,   595) =   16.77
            Prob > F =    0.0000

test [locus_of_control]female = [self_concept]female

 ( 1)  [locus_of_control]female - [self_concept]female = 0

       F(  1,   595) =   17.85
            Prob > F =    0.0000

test [locus_of_control]science = [self_concept]science, accum

 ( 1)  [locus_of_control]female - [self_concept]female = 0
 ( 2)  [locus_of_control]science - [self_concept]science = 0

       F(  2,   595) =    8.93
            Prob > F =    0.0002

Sample Write-Up of the Analysis

There are many ways to write up the results from a multivariate regression analysis.  Below is an example.  We will begin by using the estout command which can be downloaded from within Stata by typing findit estout (see How can I use the findit command to search for programs and get additional help? for more information about using findit). 
mvreg locus_of_control self_concept motivation = read write science female
estimates store m1
estout m1, cells( b(star fmt(%9.4f)) se(par fmt(%9.4f)) ) unstack style(fixed)
                       m1                                   
             locus_of_c~l    self_concept      motivation   
                     b/se            b/se            b/se   
read               0.0143***       0.0019          0.0052*  
                 (0.0037)        (0.0043)        (0.0020)   
write              0.0091*         0.0015          0.0073***
                 (0.0037)        (0.0043)        (0.0020)   
science            0.0080*         0.0016         -0.0036   
                 (0.0038)        (0.0044)        (0.0021)   
female             0.1428*        -0.1798**        0.0275   
                 (0.0553)        (0.0635)        (0.0299)   
_cons             -1.6116***      -0.1568          0.1819*  
                 (0.1562)        (0.1795)        (0.0846)   

Another column could be added to this table (manually) that would show the multivariate tests for each variable from the manova output.  Mention should be made of which equations are statistically significant, as well as which variables in each equation are statistically significant.  In many cases, the write up of the results will likely focus on the hypotheses tested with the test command.  For example, the test of the variable read indicates that its coefficient is not simultaneously equal to 0 in all three equations, even though this variable is not statistically significant in the second equation.  Another hypothesis tested regards the effect of the variables female and science.  Taken together, these variables are statistically significant predictors of locus of control and self concept (F (2, 595) = 8.93, p < .05).

In the analysis above, we conducted a multivariate multiple regression to see the effect of the variables, read, write, science and gender on three outcome measures:  locus of control, self concept and motivation.  Each of the models was statistically significant, and each of the predictors was statistically significant in at least one model.  After conducting the multivariate multiple regression, we tested several hypotheses.  The first was to determine if the variable read had an effect in all three equations simultaneously, which it did (F(3, 595) = 6.40, p < 0.05).  We also tested the hypothesis that the variables read and write together had an effect on the outcome measure locus of control, which they did (F(2,  595) =  16.77, p < 0.05).  Finally, we tested the hypothesis that both gender and science, taken together, had an effect in both the equation for locus of control and self concept, and this hypothesis was also supported by the data (F(2, 595) = 8.93, p < 0.05).

Cautions, Flies in the Ointment

  • Multivariate normal distribution assumptions are required for the outcome variables.
  • Multivariate regression analysis is not recommended for small samples.
  • The outcome variables should be at least moderately correlated for the multivariate regression analysis to make sense.
  • If the outcome variables are dichotomous (0/1), then you will want to use either mvprobit or biprobit.

    See Also

    Stata Online Manual

    References


  • How to cite this page

    Report an error on this page

    UCLA Researchers are invited to our Statistical Consulting Services
    We recommend others to our list of Other Resources for Statistical Computing Help
    These pages are Copyrighted (c) by UCLA Academic Technology Services


    The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California