### Statistical Computing Seminar

#### Introduction to Multilevel Modeling Using HLM

This seminar covers the basics of two-level hierarchical linear models using HLM 6.04. The single data set used for this seminar is created using the two SPSS data sets hsb1.sav and hsb2.sav, which come with HLM software located in the folder \HLM6\Examples\Chapter2. This is the data set .

#### Outline

• Input Data and Creating the "MDM" file
• from a level-1 and a level-2 SPSS file
• from a single SPSS file
• from a level-1 and a level-2  SAS data file

• Exploratory Data Analysis
• summary statistics
• data-based graphs

• Model Building
• unconditional means model
• regression with means-as-outcomes
• random-coefficient model
• intercepts and slopes-as-outcomes model

•  Hypothesis Testing, Model Fit
• Multivariate hypothesis tests on fixed effects
• Multivariate Tests of variance-covariance components specification
• Model-based graphs
• Other Issues
• Modeling Heterogeneity of Level-1 Variances
• Models Without a Level-1 Intercept
• Constraints on Fixed Effects

#### Starting HLM and Getting Data into HLM

The data file used for this presentation is a subsample from the 1982 High School and Beyond Survey and is used extensively in Hierarchical Linear Models by Raudenbush and Bryk. It consists of 7185 students nested in 160 schools. Here is a list of 15 or so rows from the data file.

Let's list all the variables used in this presentation.

• id: school id, the linking variable to define the 2-level structure
• mathach: student-level math achievement score, continuous outcome variable
• student-level: female and ses, the social-economic-status at student level
• school-level: schtype school type (0 = public and 1 = private) and meanses (ses aggregated to school level)

HLM 6 uses an "MDM" file (Multivariate Data Matrix) for hierarchical linear models. An MDM file is a binary file and is constructed based on an MDM template file. A template file is an ASCII file containing information on the location and the structure of the data files. Once the MDM file is created, HLM does not need the original data files anymore for the subsequent analyses. This enables HLM to perform very efficient calculations for the models.

It is worth mentioning that HLM does not have any data management capability. That is to say that most of the variables in a model have to be created outside HLM, in other statistical packages, such as in SPSS. For example, if you have a categorical variable at level-1 and you want to include it and possibly some interaction terms with other level-1 variables in the model, then you have create all the dummy variables and all the interaction terms before entering your data into HLM. In short, HLM assumes that you have cleaned your data files and have done all the exploratory statistical analysis and ready to do your multilevel analysis.

1. Creating MDM from a level-1 and a level-2 data files in SPSS format

HLM website has many examples including some detailed ones with screen shots on how to create an MDM file using SPSS input file.

• Two data sets are usually required for a two-level model. A level-1 data file and a level-2 data file. The two files are linked by a common level-2 id variable.
• Level-1 cases must be grouped together by their level-2 id. A usual strategy is to sort both the level-1 data file and the level-2 data file by the level-2 id variable and save them before entering them into HLM.
• The ID variable can be either numeric or character.
• All other variables in the data file must be numeric.

2. Creating MDM from a single SPSS data file

One improvement that HLM 6 offers is that HLM 6.x allows the use of a single data file containing both the level-1 and level-2 variables. The single data set should be sorted by the level-2 id variable and the steps are basically the same as the steps for using level-1 and level-2 data files, except the same data file is used twice, once for level-1 and once for level-2. HLM will figure out that it has to aggregate the single data file to get the level-2 variables. If the single file is huge, it might be more efficient use the two-file approach.

For level-1, we choose these variables:

For level-2, we choose these variables:

The last steps consist of a couple of clicks: Make MDM => Check Stats => Done.

3. Creating MDM from a level-1 and a level-2 data files in SAS format

Let's say that we have the HS&B file in SAS sas7bdat format, hsb1.sas7bdat and hsb2.sas7bdat. We can follow a similar routine to import the data files. HLM uses DBMSCOPY to import data files of different formats. For example, to import files in .sas7bdat format, the first thing to do is to set the type of data to other non-ASCII data via the File then Preferences pull-down menu.

Following similar steps as described in the example of import SPSS files and also by choosing the right data file type when we "Browse" to choose, we will get to the following window:

The rest of the routine is fairly straightforward and we will demonstrate during the seminar and skip the minute details here.

4. What files have been created?

Let's now go back to the approach of using a single SPSS input file and find out what files have been created and how to use them in the future. Here is the list of files that are created during the process of creating the MDM file:

The MDM file test.mdm can be opened directly in HLM for analyses. What needs to point out is the template file. The template file test.mdmt is an ASCII file and here what it contains:

#HLM2 MDM CREATION TEMPLATE
growthmodel:n
rawdattype:spss
l1fname:C:\Data\for_hlm.sav
l2fname:C:\Data\for_hlm.sav
l1missing:n
timeofdeletion:now
mdmname:test.mdm
*begin l1vars
level2id:ID
MINORITY
FEMALE
SES
MATHACH
*end l1vars
*begin l2vars
level2id:ID
SECTOR
MEANSES
*end l2vars

If we just want to add a few new variables from the original data file, we can open this template file from within HLM or edit the template file directly.

The .STS file contains the descriptive statistics and is useful in checking if the data file used in creating the MDM file is what we think it is.

                      LEVEL-1 DESCRIPTIVE STATISTICS
 VARIABLE NAME       N       MEAN         SD         MINIMUM      MAXIMUM
MINORITY          7185       0.27       0.45         0.00         1.00
FEMALE          7185       0.53       0.50         0.00         1.00
SES          7185       0.00       0.78        -3.76         2.69
MATHACH          7185      12.75       6.88        -2.83        24.99

                      LEVEL-2 DESCRIPTIVE STATISTICS
 VARIABLE NAME       N       MEAN         SD         MINIMUM      MAXIMUM
SECTOR           160       0.44       0.50         0.00         1.00
MEANSES           160      -0.00       0.41        -1.19         0.83

#### Exploratory Data Analysis

HLM offers some really nice data-based graphs. It is always a good idea to plot our data before constructing our models.

1. Box-whisker plot

2. Scatter plot

#### Model Building

Model 1: Unconditional Means Model

This model is referred as a one-way random effect ANOVA and is the simplest possible random effect linear model. The motivation for this model is the question on  how much schools vary in their mean mathematics achievement. In terms of equations, we have the following, where rij ~ N(0, σ2) and u0j ~ N(0, τ2),

MATHACHij β0j + rij
β0j =  γ00 + u0j

  The data source for this run  = C:\Data\test.mdm
The command file for this run = whlmtemp.hlm
Output file name              = C:\Data\hlm2.txt
The maximum number of level-1 units = 7185
The maximum number of level-2 units = 160
The maximum number of iterations = 100
Method of estimation: restricted maximum likelihood
 Weighting Specification
-----------------------
Weight
Variable
Weighting?   Name        Normalized?
Level 1        no
Level 2        no
Precision      no       
  The outcome variable is  MATHACH
  The model specified for the fixed effects was:
----------------------------------------------------
   Level-1                  Level-2
Coefficients             Predictors
----------------------   ---------------
INTRCPT1, B0      INTRCPT2, G00

 The model specified for the covariance components was:
---------------------------------------------------------
         Sigma squared (constant across level-2 units)
         Tau dimensions
INTRCPT1

 Summary of the model specified (in equation format)
---------------------------------------------------
Level-1 Model
	Y = B0 + R
Level-2 Model
B0 = G00 + U0
Iterations stopped due to small change in likelihood function
******* ITERATION 4 *******
 Sigma_squared =     39.14831
 Tau
INTRCPT1,B0      8.61431

Tau (as correlations)
INTRCPT1,B0  1.000
 ----------------------------------------------------
Random level-1 coefficient   Reliability estimate
----------------------------------------------------
INTRCPT1, B0                        0.901
----------------------------------------------------
The value of the likelihood function at iteration 4 = -2.355840E+004
The outcome variable is  MATHACH
 Final estimation of fixed effects:
----------------------------------------------------------------------------
Standard             Approx.
Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
----------------------------------------------------------------------------
For       INTRCPT1, B0
INTRCPT2, G00          12.636972   0.244412    51.704       159    0.000
----------------------------------------------------------------------------
 The outcome variable is  MATHACH
 Final estimation of fixed effects
(with robust standard errors)
----------------------------------------------------------------------------
Standard             Approx.
Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
----------------------------------------------------------------------------
For       INTRCPT1, B0
INTRCPT2, G00          12.636972   0.243628    51.870       159    0.000
----------------------------------------------------------------------------

 Final estimation of variance components:
-----------------------------------------------------------------------------
Random Effect           Standard      Variance     df    Chi-square  P-value
Deviation     Component
-----------------------------------------------------------------------------
INTRCPT1,       U0        2.93501       8.61431   159    1660.23259    0.000
level-1,       R         6.25686      39.14831
-----------------------------------------------------------------------------

 Statistics for current covariance components model
--------------------------------------------------
Deviance                       = 47116.793477
Number of estimated parameters = 2

Notes:

1. The model we fit was

MATHACHij β0j + rij
β0j = γ00 + u0j

Filling in the parameter estimates we get

MATHACHij β0j + rij
β0j =  12.64 + u0j
V(rij) = 39.15
V(u0j) = 8.61

2. If we describe our model in terms of a single equation, we will have to substitute the level-2 equation back to level-1 equation. Here is how it will look like in a single equation as shown in the HLM "mixed" window: MATHACHij  γ00 + u0j + rij.
3. The estimated between variance,  τ2 corresponds to the term INTRCPT1 in the output of Final estimation of variance components and the estimated within variance, σ2, corresponds to the term level-1 in the same output section.
4. Based on the covariance estimates, we can compute the intra-class correlation:  8.61431/(8.61431 + 39.14831) = .18. This tells us the portion of the total variance that occurs between schools.
5. To measure the magnitude of the variation among schools in their mean achievement levels, we can calculate the plausible values range for these means, based on the between variance we obtained from the model: 12.64 ± 1.96*(8.61)1/2 = (6.89, 18.39).
6. The reliability of the random effect of level-1 intercept is the average reliability of the level-2 units. It measures the overall reliability of the OLS estimates for each of the intercept.

Model 2: Including Effects of School Level (level 2) Predictors -- predicting mathach from meanses

This model is referred as regression with Means-as-Outcomes by Raudenbush and Bryk. The motivation of this model is the question on if the schools with high MEANSES also have high math achievement. In other words, we want to understand why there is a school difference on mathematics achievement. In terms of regression equations, we have the following.

MATHACHij β0j + rij
β0j =  γ00 + γ01(MEANSES) + u0j

 Final estimation of fixed effects:
----------------------------------------------------------------------------
Standard             Approx.
Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
----------------------------------------------------------------------------
For       INTRCPT1, B0
INTRCPT2, G00          12.649436   0.149280    84.736       158    0.000
MEANSES, G01           5.863538   0.361457    16.222       158    0.000
----------------------------------------------------------------------------
 The outcome variable is  MATHACH
 Final estimation of fixed effects
(with robust standard errors)
----------------------------------------------------------------------------
Standard             Approx.
Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
----------------------------------------------------------------------------
For       INTRCPT1, B0
INTRCPT2, G00          12.649436   0.148377    85.252       158    0.000
MEANSES, G01           5.863538   0.320211    18.311       158    0.000
----------------------------------------------------------------------------


 Final estimation of variance components:
-----------------------------------------------------------------------------
Random Effect           Standard      Variance     df    Chi-square  P-value
Deviation     Component
-----------------------------------------------------------------------------
INTRCPT1,       U0        1.62441       2.63870   158     633.51744    0.000
level-1,       R         6.25756      39.15708
-----------------------------------------------------------------------------

 Statistics for current covariance components model
--------------------------------------------------
Deviance                       = 46959.446959
Number of estimated parameters = 2

Notes:

1. The model we fit was

MATHACHij β0j + rij
β0j =  γ00 + γ01(MEANSES) + u0j

Filling in the parameter estimates we get

MATHACHij β0j + rij
β0j =  12.65 +5.86(MEANSES) + u0j
V(rij) = 39.16
V(u0j) = 2.64

2. In a single equation our model will be written as: MATHACHij γ00 + γ01(MEANSES) + u0j + rij.
3. The coefficient for the constant is the predicted math achievement when all predictors are 0, so when the school has mean SES of 0, the students' math achievement is predicted to be 12.65.
4. A range of plausible values for school means, given that all schools having meanses of zero, is 12.65 ± 1.96 *(2.64)1/2 = (9.47, 15.83).
5. The variance component representing variation between schools decreases greatly (from  8.61 to 2.64). This means that the level-2 variable meanses explains a large portion of the school-to-school variation in mean math achievement. More precisely, the proportion of variance explained by meanses is (8.61 - 2.64)/8.61 = .69, that is about 69% of the explainable variation in school mean math achievement scores can be explained by meanses.
6. Do school achievement means still vary significantly once meanses is controlled? The output of Final estimation of variance components gives the test for the variance component for the INTRCPT1 to be zero with chi-square of 633.52 of 158 degrees of freedom. This is highly significant. Therefore, we conclude that after controlling for meanses, significant variation among school mean math achievement still remains to be explained.

Model 3: Including Effects of Student-Level Predictors--predicting mathach from student-level ses

This model is referred as a random-coefficient model by Raudenbush and Bryk. Pretend that we run regression of mathach on ses on each school, that is we are going to run 160 regressions.

1. What would be the average of the 160 regression equations (both intercept and slope)?
2. How much do the regression equations vary from school to school?
3. What is the correlation between the intercepts and slopes?

These are some of the questions that motivates the following model.

MATHACHij β0j + β1j SES + rij
β0j =  γ00  + u0j
β1j =  γ10  + u1j

 Sigma_squared =     36.82835

Tau
INTRCPT1,B0      4.82978      -0.15399
SES,B1     -0.15399       0.41828

Tau (as correlations)
INTRCPT1,B0  1.000 -0.108
SES,B1 -0.108  1.000

----------------------------------------------------
Random level-1 coefficient   Reliability estimate
----------------------------------------------------
INTRCPT1, B0                        0.797
SES, B1                        0.179
----------------------------------------------------

The value of the likelihood function at iteration 21 = -2.331928E+004
The outcome variable is  MATHACH

Final estimation of fixed effects:
----------------------------------------------------------------------------
Standard             Approx.
Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
----------------------------------------------------------------------------
For       INTRCPT1, B0
INTRCPT2, G00          12.664935   0.189874    66.702       159    0.000
For      SES slope, B1
INTRCPT2, G10           2.393878   0.118278    20.240       159    0.000
----------------------------------------------------------------------------

The outcome variable is  MATHACH

Final estimation of fixed effects
(with robust standard errors)
----------------------------------------------------------------------------
Standard             Approx.
Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
----------------------------------------------------------------------------
For       INTRCPT1, B0
INTRCPT2, G00          12.664935   0.189251    66.921       159    0.000
For      SES slope, B1
INTRCPT2, G10           2.393878   0.117697    20.339       159    0.000
----------------------------------------------------------------------------

Final estimation of variance components:
-----------------------------------------------------------------------------
Random Effect           Standard      Variance     df    Chi-square  P-value
Deviation     Component
-----------------------------------------------------------------------------
INTRCPT1,       U0        2.19768       4.82978   159     905.26472    0.000
SES slope, U1        0.64675       0.41828   159     216.21178    0.002
level-1,       R         6.06864      36.82835
-----------------------------------------------------------------------------

Statistics for current covariance components model
------------------------------------------------
Deviance                       = 46638.560929
Number of estimated parameters = 4

Notes:

1. The model we fit was

MATHACHij β0j + β1j (SES) + rij
β0j =  γ00  + u0j
β1j =  γ10  + u1j

Filling in the parameter estimates we get

MATHACHij β0j + β1j (SES) + rij
β0j =  12.66  + u0j
β1j =  2.39 + u1j

V(rij) = 36.82
V(u0j) = 4.83
V(u1j) = .42

2. In a single equation our model will be written as:
MATHACHij γ00  + u0j + (γ10  + u1j )(SES) + rij
=

γ00  + γ10 *(SES) + u0j + u1j *(SES) +  rij
3. The estimate for the variance of the slope fo ses is  0.42. The p-value is .002. The test being significant tells us that we can not accept the hypothesis that there is no difference in slopes of ses among schools.
4. The 95% plausible value range for the school means when the ses is zero is 12.66 ± 1.96 *(4.83)1/2 = (8.35, 16.97).
5. The 95% plausible value range for the SES-achievement slope is 2.39 ± 1.96 *(.42)1/2 = (1.12, 3.66).
6. Notice that the residual variance is now 36.82, comparing with the residual variance of 39.15 in the one-way ANOVA with random effects model. We can compute the proportion variance explained at level 1 by (39.15 - 36.82) / 39.15 = .060. This means using student-level SES as a predictor of math achievement reduced the within-school variance by 6%.

Model 4: Including Both Level-1 and Level-2 Predictors --predicting mathach from meanses, schtype, group-centered ses and the cross level interaction of  meanses and schtype with group-centered ses.

This model is referred as an intercepts and slopes-as-outcomes model by Raudenbush and Bryk. We have examined the variability of the regression equations across schools. Now we are ready to build our final model based on our theory and our preliminary analyses.

MATHACHij β0j + β1j (SES - MEANSES) + rij
β0j =  γ00  + γ01(SCHTYPE) + γ02(MEANSES) + u0j
β1j =  γ10  + γ11(SCHTYPE) + γ12(MEANSES) + u1j


 Sigma_squared =     36.70313

Tau
INTRCPT1,B0      2.37996       0.19058
SES,B1      0.19058       0.14892

Tau (as correlations)
INTRCPT1,B0  1.000  0.320
SES,B1  0.320  1.000

----------------------------------------------------
Random level-1 coefficient   Reliability estimate
----------------------------------------------------
INTRCPT1, B0                        0.733
SES, B1                        0.073
----------------------------------------------------

The value of the likelihood function at iteration 61 = -2.325094E+004
The outcome variable is  MATHACH

Final estimation of fixed effects:
----------------------------------------------------------------------------
Standard             Approx.
Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
----------------------------------------------------------------------------
For       INTRCPT1, B0
INTRCPT2, G00          12.096006   0.198734    60.865       157    0.000
SCHTYPE, G01           1.226384   0.306272     4.004       157    0.000
MEANSES, G02           5.333056   0.369161    14.446       157    0.000
For      SES slope, B1
INTRCPT2, G10           2.937981   0.157135    18.697       157    0.000
SCHTYPE, G11          -1.640954   0.242905    -6.756       157    0.000
MEANSES, G12           1.034427   0.302566     3.419       157    0.001
----------------------------------------------------------------------------

The outcome variable is  MATHACH

Final estimation of fixed effects
(with robust standard errors)
----------------------------------------------------------------------------
Standard             Approx.
Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
----------------------------------------------------------------------------
For       INTRCPT1, B0
INTRCPT2, G00          12.096006   0.173699    69.638       157    0.000
SCHTYPE, G01           1.226384   0.308484     3.976       157    0.000
MEANSES, G02           5.333056   0.334600    15.939       157    0.000
For      SES slope, B1
INTRCPT2, G10           2.937981   0.147620    19.902       157    0.000
SCHTYPE, G11          -1.640954   0.237401    -6.912       157    0.000
MEANSES, G12           1.034427   0.332785     3.108       157    0.003
----------------------------------------------------------------------------

Final estimation of variance components:
-----------------------------------------------------------------------------
Random Effect           Standard      Variance     df    Chi-square  P-value
Deviation     Component
-----------------------------------------------------------------------------
INTRCPT1,       U0        1.54271       2.37996   157     605.29503    0.000
SES slope, U1        0.38590       0.14892   157     162.30867    0.369
level-1,       R         6.05831      36.70313
-----------------------------------------------------------------------------

Statistics for current covariance components model
--------------------------------------------------
Deviance                       = 46501.875643
Number of estimated parameters = 4
Notes:
1. The model we fit was

MATHACHij β0j + β1j (SES - MEANSES) + rij
β0j =  γ00  + γ01(SCHTYPE) + γ02(MEANSES) + u0j
β1j =  γ10  + γ11(SCHTYPE) + γ12(MEANSES) + u1j

Filling in the parameter estimates we get

MATHACHij β0j + β1j (SES - MEANSES) + rij
β0j =  12.10  + 1.22(SCHTYPE) + 5.33(MEANSES) + u0j
β1j =  2.94  + -1.64(SCHTYPE) + 1.03(MEANSES) + u1j

V(rij) = 36.7
V(u0j) = 2.37
V(u1j) = .15

2. In a single equation our model will be written as:
MATHACHij γ00  + γ01(MEANSES) + γ02(SCHTYPE) + u0j

+ (γ10  + γ11(MEANSES) + γ12(SCHTYPE) + u1j)* (SES - MEANSES) + rij

=  γ00  + γ01(MEANSES) + γ02(SCHTYPE)
+ γ10*(SES-MEANSES)  + γ11*MEANSES*(SES-MEANSES) + γ12*SCHTYPE*(SES-MENASES)
+ u0j + u1j* (SES - MEANSES) + rij

3. The estimate for the variance of the SES slope is .15 with p-value .369. That means that the hypothesis that the there is no significant variation among the slope of grouped-centered ses can not be rejected. We may want to use a simpler model where the slope of SES varies non-randomly with respect to level-2 variable meanses and schtype. We will show later how to compare the two models.
4. The correlation between the level-1 intercept and the slope for SES is given as .32 from the earlier part of the output.

#### Hypothesis Testing, Model Fit and Diagnostics

1. Multivariate Hypothesis Tests on Fixed Effects

We will test the effect of schtype on the intercept and on the slope of ses simultaneously. This will be a test of two degrees of freedom.

Click on the box labeled "1" and then fill out the boxes below to indicate we wish to test jointly that γ01 = 0 and γ11 =  0 .

                Results of General Linear Hypothesis Testing
-----------------------------------------------------------------------------
Coefficients      Contrast
-----------------------------------------------------------------------------
For       INTRCPT1, B0
INTRCPT2, G00                  12.096006      0.000   0.000
SCHTYPE, G01                   1.226384      1.000   0.000
MEANSES, G02                   5.333056      0.000   0.000
For      SES slope, B1
INTRCPT2, G10                   2.937981      0.000   0.000
SCHTYPE, G11                  -1.640954      0.000   2.000
MEANSES, G12                   1.034427      0.000   0.000

Chi-square statistic = 60.596880
Degrees of freedom   = 2
P-value              = 0.000000

2. Multivariate Tests of Variance-Covariance Components Specification

From Model 4 that we ran before, we saw that the variance for the slope of group-centered ses is not very large and its p-value is not statistically significant. This suggests that we may not want to model the group-centered ses as a random effect. A simpler model will be that the slope of variable ses varies non-randomly on level-2 variables schtype and meanses. We may want to compare these two models to decide if the simpler model is just about as good as the previous one.

• REML (restricted maximum likelihood)  vs. FML (full maximum likelihood)
• REML and FML will usually produce similar results for the level-1 residual (σ2), but there can be noticeable differences for the variance-covariance matrix of the random effects
• REML is the default estimation method for HLM.
• If the number of level-2 units is large, then the difference will be small.
• If the number of level-2 units is small , then FML variance estimates will be smaller than REML, leading to artificially short confidence interval and significant tests.
• Nested Models
• fixed effects are the same, only fewer random effects , then REML or FML are both fine for likelihood ratio tests;
• One model has fewer fixed effects and possibly fewer random effects, then use FML to compare models using likelihood ratio tests.

To compare two models, we will have to obtain the deviance (which is just -2*log likelihood) for the first model and enter it to the Hypothesis Testing before running the second model.

 Final estimation of variance components:
-----------------------------------------------------------------------------
Random Effect           Standard      Variance     df    Chi-square  P-value
Deviation     Component
-----------------------------------------------------------------------------
INTRCPT1,       U0        1.54118       2.37524   157     604.29895    0.000
level-1,       R         6.06351      36.76611
-----------------------------------------------------------------------------

 Statistics for current covariance components model
--------------------------------------------------
Deviance                       = 46502.952743
Number of estimated parameters = 2

 Statistics for current covariance components model
--------------------------------------------------
Deviance                       = 46501.875643
Number of estimated parameters = 4
 Variance-Covariance components test
-----------------------------------
Chi-square statistic         =      1.07710
Number of degrees of freedom =    2
P-value                      = >.500

3. Model-based Graphs

HLM 6 offers many model-based graphs. The graphs below are based on the following model.

Level 1 equation Graphing:

Level-2 EB/OLS coefficient confidence intervals

#### Other Issues

1. Modeling Heterogeneity of Level-1 Variances

Sometimes, the level-1 variance might be heterogeneous. For example, we may expect that female students and male students have different variances. Thus, we want to model the level-1 variance to be a function of variable female.

From pull-down menu Other Settings => Estimation Settings => Heterogeneous .sigma^2. We then have a choice on which variable(s) to choose to model the heterogeneity. Here we picked the variable female.

 RESULTS FOR HETEROGENEOUS SIGMA-SQUARED
(macro iteration 4)

Var(R) = Sigma_squared and
log(Sigma_squared) = alpha0 + alpha1(FEMALE)

Model for level-1 variance
--------------------------------------------------------------------
Standard
Parameter        Coefficient      Error       Z-ratio   P-value
--------------------------------------------------------------------
INTRCPT1    ,alpha0     3.66570      0.024718    148.301     0.000
FEMALE    ,alpha1    -0.12106      0.033936     -3.567     0.001
--------------------------------------------------------------------

Summary of Model Fit

-------------------------------------------------------------------
Model                                Number of         Deviance
Parameters
-------------------------------------------------------------------
1. Homogeneous sigma_squared             10          46494.59261
2. Heterogeneous sigma_squared           11          46482.09334
-------------------------------------------------------------------
Model Comparison                 Chi-square       df    P-value
-------------------------------------------------------------------
Model 1 vs Model 2                  12.49926       1     0.001

2. Models Without a Level-1 Intercept

Sometimes, we may want to exclude the intercept from our model. For example, we may have a level-1 categorical variable and we want to include all the categories of this variable in the model. To this end, we have to exclude the intercept, otherwise our model will be over-parameterized. To this end, we are going to create another binary variable for male (=1-female). As we have mentioned before, since HLM does not have any data management facility, we have to create this variable outside HLM. We chose SPSS for this task and modified the template file created earlier to create a new MDM file.

 The outcome variable is  MATHACH

Final estimation of fixed effects
(with robust standard errors)
----------------------------------------------------------------------------
Standard             Approx.
Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
----------------------------------------------------------------------------
For   FEMALE slope, B1
INTRCPT2, G10          10.684432   0.298122    35.839       158    0.000
SCHTYPE, G11           2.932540   0.446512     6.568       158    0.000
For     MALE slope, B2
INTRCPT2, G20          12.174859   0.322616    37.738       158    0.000
SCHTYPE, G21           2.597771   0.487027     5.334       158    0.000
----------------------------------------------------------------------------

Final estimation of variance components:
-----------------------------------------------------------------------------
Random Effect           Standard      Variance     df    Chi-square  P-value
Deviation     Component
-----------------------------------------------------------------------------
FEMALE slope, U1        2.41260       5.82064   121     481.99916    0.000
MALE slope, U2        2.64370       6.98917   121     483.25462    0.000
level-1,       R         6.22438      38.74285
-----------------------------------------------------------------------------

3. Constraints on Fixed Effects

Let's say that we believe that the effect of schtype is the same on both female and male. We need to impose the constraint  γ11 = γ21.

 The outcome variable is  MATHACH
 Final estimation of fixed effects
(with robust standard errors)
----------------------------------------------------------------------------
Standard             Approx.
Fixed Effect         Coefficient   Error      T-ratio   d.f.     P-value
----------------------------------------------------------------------------
For   FEMALE slope, B1
INTRCPT2, G10          10.723664   0.295717    36.263       158    0.000
SCHTYPE, G11   *       2.804823   0.417646     6.716       158    0.000
For     MALE slope, B2
INTRCPT2, G20          12.103608   0.313462    38.613       159    0.000
----------------------------------------------------------------------------
The "*" gammas have been constrained.  See the table on the header page.


 Final estimation of variance components:
-----------------------------------------------------------------------------
Random Effect           Standard      Variance     df    Chi-square  P-value
Deviation     Component
-----------------------------------------------------------------------------
FEMALE slope, U1        2.40847       5.80071   121     484.11557    0.000
MALE slope, U2        2.63048       6.91943   121     483.35444    0.000
level-1,       R         6.22449      38.74426
-----------------------------------------------------------------------------

#### Remarks

HLM has some very nice features for multilevel data analysis, including

• a very intuitive interface for specifying the model using a multi-equation format;
• easy to create cross-level interactions;
• produces many data-based and model-based graphs;
• latent variable regression;
• use of multiple imputed data;
• use of sampling weight

#### References

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.