UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Statistical Computing Seminar

Introduction to Multilevel Modeling Using HLM

This seminar covers the basics of two-level hierarchical linear models using HLM 5.05. We will use data files from the High School and Beyond Survey. The data files in SPSS format come with HLM software and are located in subfolder /examples/Chpater2 of HLM folder.

Outline


Model Building

At this point, we will focus on modeling building using HLM pretending that we have already done data cleaning and creating all necessary variables and further more we have created a file that HLM can use to build up models.

Our data file is a subsample from the 1982 High School and Beyond Survey and is used extensively in Hierarchical Linear Models by Raudenbush and Bryk. The data file consists of 7185 students nested in 160 schools. The outcome variable of interest is the student-level math achievement score (MATHACH). Variable SES is the social-economic-status of a student and therefore is at student-level. Variable MEANSES is the group mean of SES and therefore is at school-level. Variable  SECTOR is an indicator variable indicating if a school is public or catholic and is therefore a school-level variable. There are 90 public schools (SECTOR=0) and 70 catholic schools (SECTOR=1) in the sample.

The file that we are going to use is located in Chapter 2 of Examples of HLM folder and is called hsb.ssm.


Model 1: Unconditional Means Model

This model is referred as a one-way random effect ANOVA and is the simplest possible random effect linear model. The motivation for this model is the question on  how much schools vary in their mean mathematics achievement. In terms of equations, we have the following, where rij ~ N(0, σ2) and u0j ~ N(0, τ2),

MATHACHij β0j + rij  
β0j =  γ00 + u0j

 Program:                       HLM 5 Hierarchical Linear and Nonlinear Modeling
 Authors:                       Stephen Raudenbush, Tony Bryk, & Richard Congdon
 Publisher:                     Scientific Software International, Inc. (c) 2000
                                                      techsupport@ssicentral.com
                                                              www.ssicentral.com
 -------------------------------------------------------------------------------
 Module:      HLM2.EXE (5.05.2330.2)
 Date:        21 March 2003, Friday
 Time:        14:43:35
 -------------------------------------------------------------------------------
  SPECIFICATIONS FOR THIS HLM2 RUN                     Fri Mar 21 14:43:35 2003
 -------------------------------------------------------------------------------
  Problem Title: UNCONDITIONAL MODEL
  The data source for this run  = C:\HLM504\EXAMPLES\CHAPTER2\HSB.SSM
  The command file for this run = whlmtemp.hlm
  Output file name              = C:\HLM504\EXAMPLES\CHAPTER2\HLM2.OUT
  The maximum number of level-2 units = 160
  The maximum number of iterations = 100
  Method of estimation: restricted maximum likelihood
 Weighting Specification
 -----------------------
                         Weight
                         Variable
            Weighting?   Name        Normalized?
 Level 1        no                        no
 Level 2        no                        no
  The outcome variable is  MATHACH    
  The model specified for the fixed effects was:
 ----------------------------------------------------
   Level-1                  Level-2
   Coefficients             Predictors
 ----------------------   ---------------
         INTRCPT1, B0      INTRCPT2, G00   
 The model specified for the covariance components was:
 -------------------------------------------------------
         Sigma squared (constant across level-2 units)
         Tau dimensions
               INTRCPT1
 Summary of the model specified (in equation format)
 ---------------------------------------------------
Level-1 Model
	Y = B0 + R
Level-2 Model
	B0 = G00 + U0

 Level-1 OLS regressions
 -----------------------
 Level-2 Unit     INTRCPT1    
 ------------------------------------------------------------------------------
        1224      9.71545    
        1288     13.51080    
        1296      7.63596    
        1308     16.25550    
        1317     13.17769    
        1358     11.20623    
        1374      9.72846    
        1433     19.71914    
        1436     18.11161    
        1461     16.84264    
The average OLS level-1 coefficient for INTRCPT1 =     12.62075
 Least Squares Estimates
 -----------------------
 sigma_squared =     47.31026
 The outcome variable is  MATHACH
 Least-squares estimates of fixed effects
 ----------------------------------------------------------------------------
                                       Standard
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          12.747853   0.081145   157.099      7184    0.000
 ----------------------------------------------------------------------------
 The outcome variable is  MATHACH
 Least-squares estimates of fixed effects
 (with robust standard errors)
 ----------------------------------------------------------------------------
                                       Standard
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          12.747853   0.239305    53.270      7184    0.000
 ----------------------------------------------------------------------------
 The least-squares likelihood value = -24051.458626
 Deviance =  48102.91725
 Number of estimated parameters =    1
 STARTING VALUES
 ---------------
sigma(0)_squared =     39.14163
 Tau(0)
 INTRCPT1,B0      8.78270 
 Estimation of fixed effects
(Based on starting values of covariance components)
 ----------------------------------------------------------------------------
                                       Standard             Approx.
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          12.636711   0.246541    51.256       159    0.000
 ----------------------------------------------------------------------------
The value of the likelihood function at iteration 1 = -2.355841E+004
The value of the likelihood function at iteration 2 = -2.355840E+004
The value of the likelihood function at iteration 3 = -2.355840E+004
Iterations stopped due to small change in likelihood function
******* ITERATION 4 *******
 Sigma_squared =     39.14831
 Tau
 INTRCPT1,B0      8.61431 
Tau (as correlations)
 INTRCPT1,B0  1.000
 ----------------------------------------------------
  Random level-1 coefficient   Reliability estimate
 ----------------------------------------------------
  INTRCPT1, B0                        0.901
 ----------------------------------------------------
The value of the likelihood function at iteration 4 = -2.355840E+004

 Final estimation of fixed effects:
 ----------------------------------------------------------------------------
                                       Standard             Approx.
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          12.636972   0.244412    51.704       159    0.000
 ----------------------------------------------------------------------------
 Final estimation of fixed effects
 (with robust standard errors)
 ----------------------------------------------------------------------------
                                       Standard             Approx.
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          12.636972   0.243628    51.870       159    0.000
 ----------------------------------------------------------------------------
 Final estimation of variance components:
 -----------------------------------------------------------------------------
 Random Effect           Standard      Variance     df    Chi-square  P-value
                         Deviation     Component
 -----------------------------------------------------------------------------
 INTRCPT1,       U0        2.93501       8.61431   159    1660.23264    0.000
  level-1,       R         6.25686      39.14831
 -----------------------------------------------------------------------------
 Statistics for current covariance components model
 --------------------------------------------------
 Deviance                       = 47116.793475
 Number of estimated parameters = 2

Notes:

  1. The model we fit was

    MATHACHij β0j + rij  
    β0j = γ00 + u0j

    Filling in the parameter estimates we get

    MATHACHij β0j + rij  
    β0j =  12.64 + u0j
    V(rij) = 39.15
    V(u0j) = 8.61
     
  2. If we describe our model in terms of a single equation, we will have to substitute the level-2 equation back to level-1 equation. Here is how it will look like in a single equation: MATHACHij  γ00 + u0j + rij.
  3. The estimated between variance,  τ2 corresponds to the term INTRCPT1 in the output of Final estimation of variance components and the estimated within variance, σ2, corresponds to the term level-1 in the same output section.
  4. Based on the covariance estimates, we can compute the intra-class correlation:  8.61431/(8.61431 + 39.14831) = .18. This tells us the portion of the total variance that occurs between schools.
  5. To measure the magnitude of the variation among schools in their mean achievement levels, we can calculate the plausible values range for these means, based on the between variance we obtained from the model: 12.64 ± 1.96*(8.61)1/2 = (6.89, 18.39).
  6. The reliability of the random effect of level-1 intercept is the average reliability of the level-2 units. It measures the overall reliability of the OLS estimates for each of the intercept. 

Model 2: Including Effects of School Level (level 2) Predictors -- predicting mathach from meanses

This model is referred as regression with Means-as-Outcomes by Raudenbush and Bryk. The motivation of this model is the question on if the schools with high MEANSES also have high math achievement. In other words, we want to understand why there is a school difference on mathematics achievement. In terms of regression equations, we have the following.

MATHACHij β0j + rij  
β0j =  γ00 + γ01(MEANSES) + u0j

Sigma_squared =     39.15708
 Tau
 INTRCPT1,B0      2.63870 
Tau (as correlations)
 INTRCPT1,B0  1.000
 ----------------------------------------------------
  Random level-1 coefficient   Reliability estimate
 ----------------------------------------------------
  INTRCPT1, B0                        0.740
 ----------------------------------------------------
The value of the likelihood function at iteration 6 = -2.347972E+004
 Final estimation of fixed effects:
 ----------------------------------------------------------------------------
                                       Standard             Approx.
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          12.649436   0.149280    84.736       158    0.000
     MEANSES, G01           5.863538   0.361457    16.222       158    0.000
 ----------------------------------------------------------------------------
 Final estimation of fixed effects
 (with robust standard errors)
 ----------------------------------------------------------------------------
                                       Standard             Approx.
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          12.649436   0.148377    85.252       158    0.000
     MEANSES, G01           5.863538   0.320211    18.311       158    0.000
 ----------------------------------------------------------------------------
 Final estimation of variance components:
 -----------------------------------------------------------------------------
 Random Effect           Standard      Variance     df    Chi-square  P-value
                         Deviation     Component
 -----------------------------------------------------------------------------
 INTRCPT1,       U0        1.62441       2.63870   158     633.51745    0.000
  level-1,       R         6.25756      39.15708
 -----------------------------------------------------------------------------
 Statistics for current covariance components model
 --------------------------------------------------
 Deviance                       = 46959.446955
 Number of estimated parameters = 2

Notes:

  1. The model we fit was

    MATHACHij β0j + rij  
    β0j =  γ00 + γ01(MEANSES) + u0j

    Filling in the parameter estimates we get

    MATHACHij β0j + rij  
    β0j =  12.65 +5.86(MEANSES) + u0j
    V(rij) = 39.16
    V(u0j) = 2.64
     

  2. In a single equation our model will be written as: MATHACHij γ00 + γ01(MEANSES) + u0j + rij.
  3. The coefficient for the constant is the predicted math achievement when all predictors are 0, so when the average school SES is 0, the students math achievement is predicted to be 12.65. 
  4. A range of plausible values for school means, given that all schools have MEANSES of zero, is 12.65 ± 1.96 *(2.64)1/2 = (9.47, 15.83).
  5. The variance component representing variation between schools decreases greatly (from  8.61 to 2.64). This means that the level-2 variable meanses explains a large portion of the school-to-school variation in mean math achievement. More precisely, the proportion of variance explained by meanses is (8.61 - 2.64)/8.61 = .69, that is about 69% of the explainable variation in school mean math achievement scores can be explained by meanses.
  6. Do school achievement means still vary significantly once MEANSES is controlled? The output of Final estimation of variance components gives the test for the variance component for the INTRCPT1 to be zero with chi-square of 633.52 of 158 degrees of freedom. This is highly significant. Therefore, we conclude that after controlling for MEANSES, significant variation among school mean math achievement still remains to be explained. 
  7. We can also calculate the conditional intraclass correlation conditional on the values of MEANSES. 2.64/(2.64 + 39.16) = .06 measures the degree of dependence among observations within schools that are of the same MEANSES.

Model 3: Including Effects of Student-Level Predictors--predicting mathach from grouped centered student-level ses

This model is referred as a random-coefficient model by Raudenbush and Bryk. Pretend that we run regression of mathach on group centered ses on each school, that is we are going to run 160 regressions.

  1. What would be the average of the 160 regression equations (both intercept and slope)?
  2. How much do the regression equations vary from school to school?
  3. What is the correlation between the intercepts and slopes?

These are some of the questions that motivates the following model.

MATHACHij β0j + β1j (SES - MEANSES) + rij  
β0j =  γ00  + u0j
β1j =  γ10  + u1j

 Sigma_squared =     36.70356
 Tau
 INTRCPT1,B0      8.68087       0.04701 
      SES,B1      0.04701       0.68038 
Tau (as correlations)
 INTRCPT1,B0  1.000  0.019
      SES,B1  0.019  1.000
 ----------------------------------------------------
  Random level-1 coefficient   Reliability estimate
 ----------------------------------------------------
  INTRCPT1, B0                        0.908
       SES, B1                        0.260
 ----------------------------------------------------
The value of the likelihood function at iteration 18 = -2.335620E+004
 Final estimation of fixed effects:
 ----------------------------------------------------------------------------
                                       Standard             Approx.
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          12.636197   0.244503    51.681       159    0.000
 For      SES slope, B1
    INTRCPT2, G10           2.193157   0.127879    17.150       159    0.000
 ----------------------------------------------------------------------------
 Final estimation of fixed effects
 (with robust standard errors)
 ----------------------------------------------------------------------------
                                       Standard             Approx.
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          12.636197   0.243738    51.843       159    0.000
 For      SES slope, B1
    INTRCPT2, G10           2.193157   0.127846    17.155       159    0.000
 ----------------------------------------------------------------------------
 Final estimation of variance components:
 -----------------------------------------------------------------------------
 Random Effect           Standard      Variance     df    Chi-square  P-value
                         Deviation     Component
 -----------------------------------------------------------------------------
 INTRCPT1,       U0        2.94633       8.68087   159    1770.85120    0.000
      SES slope, U1        0.82485       0.68038   159     213.43769    0.003
  level-1,       R         6.05835      36.70356
 -----------------------------------------------------------------------------
 Statistics for current covariance components model
 --------------------------------------------------
 Deviance                       = 46712.398925
 Number of estimated parameters = 4

Notes:

  1. The model we fit was

    MATHACHij β0j + β1j (SES - MEANSES) + rij  
    β0j =  γ00  + u0j
    β1j =  γ10  + u1j

    Filling in the parameter estimates we get

    MATHACHij β0j + β1j (SES - MEANSES) + rij  
    β0j =  12.64  + u0j
    β1j =  2.19 + u1j

    V(rij) = 36.7
    V(u0j) = 8.68
    V(u1j) = .68
     

  2. In a single equation our model will be written as:
    MATHACHij γ00  + u0j + (γ10  + u1j )(SES - MEANSES) + rij
                                =
     
      γ00  + γ10 *(SES - MEANSES) + u0j + u1j *(SES - MEANSES) +  rij
  3. The estimate for the variance of the slope for group centered SES  is  0.68. The p-value is .003. The test being significant tells us that we can not accept the hypothesis that there is no difference in slopes among schools.
  4. The 95% plausible value range for the school means is 12.64 ± 1.96 *(8.68)1/2 = (6.87, 18.41).
  5. The 95% plausible value range for the SES-achievement slope is 2.19 ± 1.96 *(.68)1/2 = (.57, 3.81).
  6. Notice that the residual variance is now 36.70, comparing with the residual variance of 39.15 in the one-way ANOVA with random effects model. We can compute the proportion variance explained at level 1 by (39.15 - 36.70) / 39.15 = .063. This means using student-level SES as a predictor of math achievement reduced the within-school variance by 6.3%.
  7. The correlation between the intercept and the slope is .019. It seems that they are not highly correlated.

Model 4: Including Both Level-1 and Level-2 Predictors --predicting mathach from meanses, sector, cses and the cross level interaction of  meanses and sector with cses

This model is referred as an intercepts and slopes-as-outcomes model by Raudenbush and Bryk. We have examined the variability of the regression equations across schools. Now we will build an explanatory model to account for the variability. That is we want to model the following:

MATHACHij β0j + β1j (SES - MEANSES) + rij  
β0j =  γ00  + γ01(SECTOR) + γ02(MEANSES) + u0j
β1j =  γ10  + γ11(SECTOR) + γ12(MEANSES) + u1j

Sigma_squared =     36.70313
 Tau
 INTRCPT1,B0      2.37996       0.19058 
      SES,B1      0.19058       0.14892 
Tau (as correlations)
 INTRCPT1,B0  1.000  0.320
      SES,B1  0.320  1.000
 ----------------------------------------------------
  Random level-1 coefficient   Reliability estimate
 ----------------------------------------------------
  INTRCPT1, B0                        0.733
       SES, B1                        0.073
 ----------------------------------------------------
The value of the likelihood function at iteration 61 = -2.325094E+004
 Final estimation of fixed effects:
 ----------------------------------------------------------------------------
                                       Standard             Approx.
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          12.096006   0.198734    60.865       157    0.000
      SECTOR, G01           1.226384   0.306272     4.004       157    0.000
     MEANSES, G02           5.333056   0.369161    14.446       157    0.000
 For      SES slope, B1
    INTRCPT2, G10           2.937981   0.157135    18.697       157    0.000
      SECTOR, G11          -1.640954   0.242905    -6.756       157    0.000
     MEANSES, G12           1.034427   0.302566     3.419       157    0.001
 ----------------------------------------------------------------------------
 Final estimation of fixed effects
 (with robust standard errors)
 ----------------------------------------------------------------------------
                                       Standard             Approx.
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          12.096006   0.173699    69.638       157    0.000
      SECTOR, G01           1.226384   0.308484     3.976       157    0.000
     MEANSES, G02           5.333056   0.334600    15.939       157    0.000
 For      SES slope, B1
    INTRCPT2, G10           2.937981   0.147620    19.902       157    0.000
      SECTOR, G11          -1.640954   0.237401    -6.912       157    0.000
     MEANSES, G12           1.034427   0.332785     3.108       157    0.002
 ----------------------------------------------------------------------------
 Final estimation of variance components:
 -----------------------------------------------------------------------------
 Random Effect           Standard      Variance     df    Chi-square  P-value
                         Deviation     Component
 -----------------------------------------------------------------------------
 INTRCPT1,       U0        1.54271       2.37996   157     605.29503    0.000
      SES slope, U1        0.38590       0.14892   157     162.30867    0.369
  level-1,       R         6.05831      36.70313
 -----------------------------------------------------------------------------
 Statistics for current covariance components model
 --------------------------------------------------
 Deviance                       = 46501.875635
 Number of estimated parameters = 4

Notes:

  1. The model we fit was

    MATHACHij β0j + β1j (SES - MEANSES) + rij  
    β0j =  γ00  + γ01(SECTOR) + γ02(MEANSES) + u0j
    β1j =  γ10  + γ11(SECTOR) + γ12(MEANSES) + u1j

    Filling in the parameter estimates we get

    MATHACHij β0j + β1j (SES - MEANSES) + rij  
    β0j =  12.10  + 1.22(SECTOR) + 5.33(MEANSES) + u0j
    β1j =  2.94 + -1.64(SECTOR) + 1.03(MEANSES) + u1j

    V(rij) = 36.7
    V(u0j) = 2.37
    V(u1j) = .15
     

  2. In a single equation our model will be written as:
    MATHACHij γ00  + γ01(MEANSES) + γ02(SECTOR) + u0j
                                    
    + (γ10  + γ11(MEANSES) + γ12(SECTOR) + u1j)* (SES - MEANSES) + rij
     
                         =  γ00  + γ01(MEANSES) + γ02(SECTOR)
                            + γ10*(SES-MEANSES)  + γ11*MEANSES*(SES-MEANSES) + γ12*SECTOR*(SES-MENASES)
                            + u0j + u1j* (SES - MEANSES) + rij
     
  3. The estimate for the variance of the SES slope is .15 with p-value .369. That is the hypothesis that the there is no significant variation among the slope of SES can not be rejected. We may want to use a simpler model where the slope of SES varies nonrandomly with respect to level-2 variable MEANSES and SECTOR. We will show later how to compare the two models.
  4. The correlation between the level-1 intercept and the slope for SES is given as .32 from the earlier part of the output.

Graphing the Results

HLM has some ability to help you visualize the effect of level 2 variables. Before running the model, we have to request to graph the equations. From the pull-down menu, we will have to click on Graph equations and we can simply accept the default option there (there isn't much to begin with). After the running the model, from the pull-down menu File, we can click on Graph equations to obtain different graphs.


Starting HLM and Getting Data into HLM

Now we have seen how HLM builds up and analyzes a model. We are going to see how to enter our data into HLM.

HLM uses an "SSM file" (sufficient statistics matrices) for hierarchical linear models. An SSM file is constructed based on raw data files. It is worth mentioning that HLM does not have any data management capability. That is to say that all the variables in a model have to be created outside HLM, in other statistical packages, such as in SPSS. For example, if you have a categorical variable at level-1 and you want to include it and possibly some interaction terms with other level-1 variables in the model, then you have create all the dummy variables and all the interaction terms before entering your data into HLM. In short, HLM assumes that you have cleaned your data files and have done all the exploratory statistical analysis and ready to do your multilevel analysis.

Data files in SPSS format

Data files in sas7bdat format

Let's say that we have the HS&B file in SAS sas7bdat format, hsb1.sas7bdat and hsb2.sas7bdat. We can follow a similar routine to import the data files. HLM uses DBMSCOPY to import data files of different formats. For example, to import files in .sas7bdat format, the first thing to do is to set the type of data to other non-ASCII data via the File then Preferences pull-down menu.

Following similar steps as described in the example of import SPSS files, we will get to the following window:

When we browse to specify a level-1 file name, we will see the following window and we can then choose the corresponding type of the data

The rest of the routine is fairly straightforward and we will demonstrate during the seminar and skip the minute details here.

Single data file in Stata format using Stata Version 7

What if our data come in one single file with both level-1 variables and level-2 variables in it, for example, in Stata format? One has to collapse the single file at level-2 to get the level-2 file in Stata and sort both files and save them. Then following the similar routine as for SAS format to create an SSM file. 

We also wrote a Stata program stata2hlm that will create an SSM file based on a single Stata file.

. use http://www.ats.ucla.edu/stat/seminars/hlm_mlm/hsb12
. desc
Contains data from hsb12.dta
  obs:         7,185                          
 vars:            11                          13 Mar 2003 09:32
 size:       244,290 (76.1% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
school          str4   %4s                    
minority        byte   %9.0g                  
female          byte   %9.0g                  
ses             float  %9.0g                  
mathach         float  %9.0g                  
size            int    %9.0g                  
sector          byte   %9.0g                  
pracad          float  %9.0g                  
disclim         float  %9.0g                  
himinty         byte   %9.0g                  
meanses         float  %9.0g                  
-------------------------------------------------------------------------------
Sorted by:  school  

And we can list the data.

. list 
<output omitted>

Get stata2hlm (assuming we do not yet have it).

. net from http://www.ats.ucla.edu/stat/stata/ado/analysis/
. net install stata2hlm

Now we can use stata2hlm to convert the file to HLM format.  This assumes

  1. We have to set our path variable so we can call HLM from within Stata.
  2. In HLM we have selected File then Preferences and then checked "other non-ASCII"
. stata2hlm using hsb_stata, id(school) l1(minority female ses mathach) 
    l2(size sector pracad disclim himinty meanses)
file hsb_stata.ssm created.
You can open hsb_stata.ssm file from within HLM
and perform your analysis.

We should  check the statistics on both level-1 and level-2 variables to compare our original Stata data with the values converted to HLM.  We can look at the file  hlm2ssm.sts to see the level-1 and level-2 descriptive statistics for the HLM file as shown below.

. ls
  <dir>   4/03/03 10:29  .                 
  <dir>   4/03/03 10:29  ..                
   0.1k   4/03/03 10:29  createss.rsp      
   0.9k   4/03/03 10:29  HLM2SSM.STS       
  24.7k   4/03/03 10:29  hsb_stata.SSM     
   0.1k   4/03/03 10:29  hsb_stata1.rsp    
  92.1k   4/03/03 10:29  hsb_stata1_l1.dta 
   4.1k   4/03/03 10:29  hsb_stata1_l2.dta
. type hlm2ssm.sts
                      LEVEL-1 DESCRIPTIVE STATISTICS
 VARIABLE NAME       N       MEAN         SD         MINIMUM      MAXIMUM
  MINORITY          7185       0.27       0.45         0.00         1.00
    FEMALE          7185       0.53       0.50         0.00         1.00
       SES          7185       0.00       0.78        -3.76         2.69
   MATHACH          7185      12.75       6.88        -2.83        24.99
                      LEVEL-2 DESCRIPTIVE STATISTICS
 VARIABLE NAME       N       MEAN         SD         MINIMUM      MAXIMUM
      SIZE           160    1097.82     629.51       100.00      2713.00
    SECTOR           160       0.44       0.50         0.00         1.00
    PRACAD           160       0.51       0.26         0.00         1.00
   DISCLIM           160      -0.02       0.98        -2.42         2.76
   HIMINTY           160       0.28       0.45         0.00         1.00
   MEANSES           160      -0.00       0.41        -1.19         0.83

We can cross compare this back to the results of our Stata file as shown below.

. summarize minority female ses mathach

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
    minority |    7185     .274739   .4464137          0          1
      female |    7185    .5281837   .4992398          0          1
         ses |    7185    .0001434   .7793552     -3.758      2.692
     mathach |    7185    12.74785   6.878246     -2.832     24.993
. sort school
. by school: gen studentnum = _n
. summarize size sector pracad disclim himinty meanses if studentnum == 1

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
        size |     160    1097.825   629.5064        100       2713
      sector |     160       .4375   .4976359          0          1
      pracad |     160    .5139375   .2558967          0          1
     disclim |     160    -.015125   .9769777     -2.416      2.756
     himinty |     160        .275   .4479162          0          1
     meanses |     160   -.0001875   .4139731     -1.188       .831

The first set of output matches to the level 1 results from HLM.  The second part of the output selects just one record for each school and thus you can see the results match to the level 2 results for HLM.

By the way, you can make level 2 variables like meanses from within Stata using the egen command, for example.

. egen meanses = mean(ses), by(school)

Analysis of Model Fit

Residual File and Analysis of Model Fit

Let's say that our final model is going to be based on the last model we ran with both the intercept and the slope for SES as random effects. What about the model fit? Are there any potential problems with the model?

1. Residual File

We can make use of the residual file that HLM creates to check on model fit. This time before running the model, we will request for residual file to be created. From the Basic Specifications, click on Create Residual File and the following dialog window will be displayed. We can choose to include extra level-2 variables that are not in the model and we can also choose the residual file type. We will include all the possible level-2 variables in the residual file and we will request that the residual file be created in SPSS format. All the plots and analysis based on the residual file will be preformed in SPSS.

Let's have a look at the header of the residual file for our model. The last six variables are all the level-2 variables in the file and we have requested to have them in the residual file. Each observation corresponds to a level-2 unit. In our case, there will be 160 observations in the residual file, because there are 160 schools.

DATA LIST FIXED RECORDS = 6
 /1 ID NJ CHIPCT MDIST LNTOTVAR OLSRSVAR MDRSVAR (A12,F5,5F11.5)
 /2 EBINTRCP EBSES    (2F11.5)
 /3 OLINTRCP OLSES    (2F11.5)
 /4 FVINTRCP,FVSES   ,(2F11.5)
 /5 PV00 PV10 PV11 (3F11.5)
 /6     SIZE   SECTOR   PRACAD  DISCLIM  HIMINTY  MEANSES ( 6F11.5).
BEGIN DATA
        1224   47    0.01875    0.00336    2.02739    2.01643    2.00545
   -0.07325   -0.00556
   -0.09862    0.01540
    9.81407    2.49318
   0.583149   0.046979   0.049143
  842.00000    0.00000    0.35000    1.59700    0.00000   -0.42800
        1288   25    0.11875    0.14785    1.94945    1.92016    1.89906
    0.45128    0.03914
    0.73235    0.18246
   12.77845    3.07299
   0.897318   0.072951   0.051704
 1855.00000    0.00000    0.27000    0.17400    0.00000    0.12800

Q-Q Plot of CHIPCT against MDIST (Scatter plot of CHIPCT against MDIST)

Under the assumption of normality at level-2, the plot of CHIPCT against MDIST should be a 45 degree line. Otherwise, we have some evidence against the normality assumption. For example, in our plot we see some deviation from the normality at the right end of the plot. The plot also allows us to detect outliers for further investigation.

Histogram of MDRSVAR

Variable MDRSVAR measures the within-school standard deviation for the 160 schools based on the final fitted model. From the histogram we see that there are a few schools that have smaller within-school variance than others.

Q-Q Plot of Standardized MDRSVAR

The purpose of this plot is similar to the previous one.

Potential level-2 predictors

Potentially, there are other level-2 variables that should be in the model. For example, variable PRACAD, the proportion of students in the academic track may be a good candidate for a level-2 variable. To visually see it, we can plot the empirical Bayes residual estimate for the intercept for each school against PRACAD. The evidence that  there is a significant association between the intercept and PRACAD suggests that we may have a model specification error, meaning we have omitted some important variable.

Nonlinearity at level-2

This is very similar to what we usual do with OLS analysis. We plot the residual against a continuous predictor to detect any nonlinearity. In this case, we have the residual for the intercept against the level-2 predictor MEANSES. Curved plot will show evidence of nonlinearity. It seems that we don't have much problem concerning the nonlinearity here. 

2. Test Homogeneity of Level-1 Variances

To test the homogeneity of the level 1 variance we will choose Optional Hypothesis Testing/Estimation from the Optional Specifications pull-down menu. We will then check the box to get the test of homogeneity of level 1 variance as shown below, i.e. a test that the variance of mathach is the same across schools.

The test below suggests that the variance of mathach varies significantly across schools, contrary to the assumptions of of homogeneity of level-1 variance. If the variance of matchach is related to level 1 or level 2 predictors, this can bias the estimates (see Raudenbush and Bryk, pg. 263-267 for more information).

 Test of homogeneity of level-1 variance
 ----------------------------------------
 Chi-square statistic         =    245.76581
 Number of degrees of freedom =  159
 P-value                      = 0.000
3. Multivariate Hypothesis Tests on Fixed Effects

To test the effect of SECTOR on the intercept and on the SES slope simultaneously we will choose Optional Hypothesis Testing /Estimation from the Optional Specifications pull-down menu. We will then click on the 1 button to specify our first hypothesis (see below).

We then fill out the boxes below to indicate we wish to jointly test γ01 = 0 and γ11 =  0 .

 -----------------------------------------------------------------------------
                Results of General Linear Hypothesis Testing
 -----------------------------------------------------------------------------
                                   Coefficients      Contrast
 -----------------------------------------------------------------------------
 For       INTRCPT1, B0
     INTRCPT2, G00                  12.096006      0.000   0.000
       SECTOR, G01                   1.226384      1.000   0.000
      MEANSES, G02                   5.333056      0.000   0.000
 For      SES slope, B1
     INTRCPT2, G10                   2.937981      0.000   0.000
       SECTOR, G11                  -1.640954      0.000   1.000
      MEANSES, G12                   1.034427      0.000   0.000

Chi-square statistic = 66.141819
Degrees of freedom   = 2
P-value              = 0.000000

4. Multivariate Tests of Variance-Covariance Components Specification

From Model 4 that we ran before, we saw that the p-value for the test on variance component of the slope is not significant. This suggests that we may not want to model the slope as randomly varying. A simpler model will be that the slope of variable SES varies nonrandomly on level-2 variables SECTOR and MEANSES. We may want to compare these two models to decide that the simpler model is comparable with the previous one.

 Sigma_squared =     36.76611
 Tau
 INTRCPT1,B0      2.37524 
Tau (as correlations)
 INTRCPT1,B0  1.000
 ----------------------------------------------------
  Random level-1 coefficient   Reliability estimate
 ----------------------------------------------------
  INTRCPT1, B0                        0.732
 ----------------------------------------------------
The value of the likelihood function at iteration 6 = -2.325148E+004

 Final estimation of fixed effects:
 ----------------------------------------------------------------------------
                                       Standard             Approx.
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          12.096251   0.198643    60.894       157    0.000
      SECTOR, G01           1.224401   0.306117     4.000       157    0.000
     MEANSES, G02           5.336698   0.368978    14.463       157    0.000
 For      SES slope, B1
    INTRCPT2, G10           2.935860   0.150705    19.481      7179    0.000
      SECTOR, G11          -1.642102   0.233097    -7.045      7179    0.000
     MEANSES, G12           1.044120   0.291042     3.588      7179    0.001
 ----------------------------------------------------------------------------
 Final estimation of fixed effects
 (with robust standard errors)
 ----------------------------------------------------------------------------
                                       Standard             Approx.
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          12.096251   0.173691    69.642       157    0.000
      SECTOR, G01           1.224401   0.308507     3.969       157    0.000
     MEANSES, G02           5.336698   0.334617    15.949       157    0.000
 For      SES slope, B1
    INTRCPT2, G10           2.935860   0.147580    19.893      7179    0.000
      SECTOR, G11          -1.642102   0.237223    -6.922      7179    0.000
     MEANSES, G12           1.044120   0.332897     3.136      7179    0.002
 ----------------------------------------------------------------------------
 Final estimation of variance components:
 -----------------------------------------------------------------------------
 Random Effect           Standard      Variance     df    Chi-square  P-value
                         Deviation     Component
 -----------------------------------------------------------------------------
 INTRCPT1,       U0        1.54118       2.37524   157     604.29893    0.000
  level-1,       R         6.06351      36.76611
 -----------------------------------------------------------------------------
 Statistics for current covariance components model
 --------------------------------------------------
 Deviance                       = 46502.952734
 Number of estimated parameters = 2
 Variance-Covariance components test
 -----------------------------------
 Chi-square statistic         =      1.07710
 Number of degrees of freedom =    2
 P-value                      = >.500

Other Issues

  1. Modeling Heterogeneity of Level-1 Variances
  2. Sometimes, the level-1 variance may be heterogeneous. For example, we may expect that female students and male students have different variances. Thus, we want to model the level-1 variance to be a function of variable female. From pull-down menu Optional Specifications, we can choose Heterogeneous sigma^2. We then have a choice on which variable(s) to choose to model the heterogeneity. Here we picked variable female. From the output of Summary of Model Fit, we see that the model with heterogeneous sigma^2 is a better model (with p-value much less than .05).

    RESULTS FOR HETEROGENEOUS SIGMA-SQUARED
    (macro iteration 4)
     Var(R) = Sigma_squared and
     log(Sigma_squared) = alpha0 + alpha1(FEMALE)
    
     Model for level-1 variance
     --------------------------------------------------------------------
                                          Standard
        Parameter        Coefficient      Error       Z-ratio   P-value
     --------------------------------------------------------------------
     INTRCPT1    ,alpha0     3.66581      0.024717    148.308     0.000
       FEMALE    ,alpha1    -0.12121      0.033936     -3.572     0.001
     --------------------------------------------------------------------
    
    Summary of Model Fit
     -------------------------------------------------------------------
     Model                                Number of         Deviance
                                          Parameters
     -------------------------------------------------------------------
     1. Homogeneous sigma_squared             10          46494.59260
     2. Heterogeneous sigma_squared           11          46482.02598
     -------------------------------------------------------------------
     Model Comparison                 Chi-square       df    P-value
     -------------------------------------------------------------------
     Model 1 vs Model 2                  12.56662       1     0.001
     Tau
     INTRCPT1,B0      2.28785       0.18104 
          SES,B1      0.18104       0.06314 
    
     Standard Errors of Tau
     INTRCPT1,B0      0.35166       0.19435 
          SES,B1      0.19435       0.20673 
    
    Tau (as correlations)
     INTRCPT1,B0  1.000  0.476
          SES,B1  0.476  1.000
     ----------------------------------------------------
      Random level-1 coefficient   Reliability estimate
     ----------------------------------------------------
      INTRCPT1, B0                        0.725
           SES, B1                        0.033
     ----------------------------------------------------
    The value of the likelihood function at iteration 2 = -2.324101E+004
     Final estimation of fixed effects:
     ----------------------------------------------------------------------------
                                           Standard             Approx.
        Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
     ----------------------------------------------------------------------------
     For       INTRCPT1, B0
        INTRCPT2, G00          12.051613   0.195912    61.515       157    0.000
          SECTOR, G01           1.251659   0.301753     4.148       157    0.000
         MEANSES, G02           5.325620   0.363646    14.645       157    0.000
     For      SES slope, B1
        INTRCPT2, G10           2.953136   0.153404    19.251       157    0.000
          SECTOR, G11          -1.645357   0.236721    -6.951       157    0.000
         MEANSES, G12           1.032397   0.295122     3.498       157    0.001
     ----------------------------------------------------------------------------
     The outcome variable is  MATHACH
     Final estimation of fixed effects
     (with robust standard errors)
     ----------------------------------------------------------------------------
                                           Standard             Approx.
        Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
     ----------------------------------------------------------------------------
     For       INTRCPT1, B0
        INTRCPT2, G00          12.051613   0.172909    69.699       157    0.000
          SECTOR, G01           1.251659   0.307508     4.070       157    0.000
         MEANSES, G02           5.325620   0.333327    15.977       157    0.000
     For      SES slope, B1
        INTRCPT2, G10           2.953136   0.147969    19.958       157    0.000
          SECTOR, G11          -1.645357   0.239437    -6.872       157    0.000
         MEANSES, G12           1.032397   0.336393     3.069       157    0.003
     ----------------------------------------------------------------------------
    
     Final estimation of variance components:
     -----------------------------------------------------------------------------
     Random Effect           Standard      Variance     df    Chi-square  P-value
                             Deviation     Component
     -----------------------------------------------------------------------------
     INTRCPT1,       U0        1.51257       2.28785   157     599.60884    0.000
          SES slope, U1        0.25128       0.06314   157     162.54539    0.364
     -----------------------------------------------------------------------------
  3. Models Without a Level-1 Intercept
  4. Sometimes, we may want to exclude the intercept from our model. For example, we may have a level-1 categorical variable and we want to include all the categories of this variable in the model. To this end, we have to exclude the intercept, otherwise our model will be over-identified. 


     

  5. Constraints on Fixed Effects
  6. Choose Optional Specifications then Constrain Gammas and then we can constrain, for example, the effect of γ02 to be equal to γ12.

     Summary of the model specified (in equation format)
     ---------------------------------------------------
    Level-1 Model
    	Y = B0 + B1*(SES) + R
    Level-2 Model
    	B0 = G00 + G01*(SECTOR) + G02*(MEANSES) + U0
    	B1 = G10 + G11*(SECTOR) + G12*(MEANSES) + U1
    	G02 = G12
     Final estimation of fixed effects:
     ----------------------------------------------------------------------------
                                           Standard             Approx.
        Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
     ----------------------------------------------------------------------------
     For       INTRCPT1, B0
        INTRCPT2, G00          11.741608   0.222122    52.861       157    0.000
          SECTOR, G01           2.016592   0.335380     6.013       157    0.000
         MEANSES, G02   *       2.654259   0.237376    11.182       157    0.000
     For      SES slope, B1
        INTRCPT2, G10           3.159636   0.163404    19.336       158    0.000
          SECTOR, G11          -2.096300   0.249567    -8.400       158    0.000
     ----------------------------------------------------------------------------
     The "*" gammas have been constrained.  See the table on the header page.
     The outcome variable is  MATHACH
     Final estimation of fixed effects
     (with robust standard errors)
     ----------------------------------------------------------------------------
                                           Standard             Approx.
        Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
     ----------------------------------------------------------------------------
     For       INTRCPT1, B0
        INTRCPT2, G00          11.741608   0.208945    56.195       157    0.000
          SECTOR, G01           2.016592   0.331788     6.078       157    0.000
         MEANSES, G02   *       2.654259   0.253568    10.468       157    0.000
     For      SES slope, B1
        INTRCPT2, G10           3.159636   0.158565    19.926       158    0.000
          SECTOR, G11          -2.096300   0.240992    -8.699       158    0.000
     ----------------------------------------------------------------------------
     The "*" gammas have been constrained.  See the table on the header page.
    
    
     Final estimation of variance components:
     -----------------------------------------------------------------------------
     Random Effect           Standard      Variance     df    Chi-square  P-value
                             Deviation     Component
     -----------------------------------------------------------------------------
     INTRCPT1,       U0        1.84024       3.38650   157     787.30257    0.000
          SES slope, U1        0.60746       0.36901   157     192.72982    0.027
      level-1,       R         6.06228      36.75129
     -----------------------------------------------------------------------------
    
     Statistics for current covariance components model
     --------------------------------------------------
     Deviance                       = 46567.491216
     Number of estimated parameters = 9
  7. Level-1 variable as a random effect but not a fixed effect

    Sometimes, we may want to model a level-1 variable only as a random effect. For example, the effect of gender, on average, is not significant, as possibly shown below. In this case, we may not want to estimate the fixed effect of variable female. Instead, we only use variable female to model the variance.

 Summary of the model specified (in equation format)
 ---------------------------------------------------
Level-1 Model
	Y = B0 + B1*(FEMALE) + B2*(SES) + R
Level-2 Model
	B0 = G00 + G01*(SECTOR) + G02*(MEANSES) + U0
	B1 = U1
	B2 = G20 + G21*(SECTOR) + G22*(MEANSES) + U2

Level-1 OLS regressions
 -----------------------
 Level-2 Unit     INTRCPT1      FEMALE slope       SES slope  
 ------------------------------------------------------------------------------
        1224     10.94364        -2.06161         2.64284    
        1288     13.01384         1.12946         3.35269    
        1296      8.51189        -1.35628         1.00858    
        1358     11.32162        -0.31470         4.98310    
        1374     10.94983        -2.63062         3.85439    
        1436     19.48721        -4.03507         2.58733    
        1461     17.50503        -1.15047         6.28572    
 Sigma_squared =     36.32580
 Standard Error of Sigma_squared =      0.62429
 Tau
 INTRCPT1,B0      3.56382      -2.00755       0.33526 
   FEMALE,B1     -2.00755       2.32409      -0.23060 
      SES,B2      0.33526      -0.23060       0.10378 
 Standard Errors of Tau
 INTRCPT1,B0      0.62400       0.57492       0.26957 
   FEMALE,B1      0.57492       0.72587       0.28483 
      SES,B2      0.26957       0.28483       0.21197 
Tau (as correlations)
 INTRCPT1,B0  1.000 -0.698  0.551
   FEMALE,B1 -0.698  1.000 -0.470
      SES,B2  0.551 -0.470  1.000
 ----------------------------------------------------
  Random level-1 coefficient   Reliability estimate
 ----------------------------------------------------
  INTRCPT1, B0                        0.643
    FEMALE, B1                        0.384
       SES, B2                        0.052
 ----------------------------------------------------
Note: The reliability estimates reported above are based on only 123 of 160
units that had sufficient data for computation.  Fixed effects and variance
components are based on all the data.
The value of the likelihood function at iteration 289 = -2.323661E+004
 Final estimation of fixed effects:
 ----------------------------------------------------------------------------
                                       Standard             Approx.
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          11.933404   0.187148    63.764       157    0.000
      SECTOR, G01           1.180992   0.293859     4.019       157    0.000
     MEANSES, G02           5.281064   0.353685    14.932       157    0.000
 For      SES slope, B2
    INTRCPT2, G20           2.875972   0.155149    18.537       157    0.000
      SECTOR, G21          -1.615421   0.239356    -6.749       157    0.000
     MEANSES, G22           1.033760   0.298472     3.464       157    0.001
 ----------------------------------------------------------------------------
 The outcome variable is  MATHACH
 Final estimation of fixed effects
 (with robust standard errors)
 ----------------------------------------------------------------------------
                                       Standard             Approx.
    Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
 ----------------------------------------------------------------------------
 For       INTRCPT1, B0
    INTRCPT2, G00          11.933404   0.173443    68.803       157    0.000
      SECTOR, G01           1.180992   0.302841     3.900       157    0.000
     MEANSES, G02           5.281064   0.330903    15.960       157    0.000
 For      SES slope, B2
    INTRCPT2, G20           2.875972   0.146171    19.675       157    0.000
      SECTOR, G21          -1.615421   0.236022    -6.844       157    0.000
     MEANSES, G22           1.033760   0.328717     3.145       157    0.002
 ----------------------------------------------------------------------------
 Final estimation of variance components:
 -----------------------------------------------------------------------------
 Random Effect           Standard      Variance     df    Chi-square  P-value
                         Deviation     Component
 -----------------------------------------------------------------------------
 INTRCPT1,       U0        1.88781       3.56382   120     326.18770    0.000
   FEMALE slope, U1        1.52450       2.32409   123     187.64867    0.000
      SES slope, U2        0.32214       0.10378   120     122.76674    0.413
  level-1,       R         6.02709      36.32580
 -----------------------------------------------------------------------------
Note: The chi-square statistics reported above are based on only 123 of 160
units that had sufficient data for computation.  Fixed effects and variance
components are based on all the data.
 Statistics for current covariance components model
 --------------------------------------------------
 Deviance                       = 46473.216028
 Number of estimated parameters = 13

Conclusion

HLM has some very nice strengths for the analysis of multilevel data. 


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California