|
|
|
||||
|
|
|||||
The data file used for this presentation is a subsample from the 1982 High School and Beyond Survey and is used extensively in Hierarchical Linear Models by Raudenbush and Bryk. It consists of 7185 students nested in 160 schools. Here is a list of 15 or so rows from the data file.

Let's list all the variables used in this presentation.
- id: school id, the linking variable to define the 2-level structure
- mathach: student-level math achievement score, continuous outcome variable
- student-level: female and ses, the social-economic-status at student level
- school-level: schtype school type (0 = public and 1 = private) and meanses (ses aggregated to school level)
HLM 6 uses an "MDM" file (Multivariate Data Matrix) for hierarchical linear models. An MDM file is a binary file and is constructed based on an MDM template file. A template file is an ASCII file containing information on the location and the structure of the data files. Once the MDM file is created, HLM does not need the original data files anymore for the subsequent analyses. This enables HLM to perform very efficient calculations for the models.
It is worth mentioning that HLM does not have any data management capability. That is to say that most of the variables in a model have to be created outside HLM, in other statistical packages, such as in SPSS. For example, if you have a categorical variable at level-1 and you want to include it and possibly some interaction terms with other level-1 variables in the model, then you have create all the dummy variables and all the interaction terms before entering your data into HLM. In short, HLM assumes that you have cleaned your data files and have done all the exploratory statistical analysis and ready to do your multilevel analysis.
1. Creating MDM from a level-1 and a level-2 data files in SPSS format
HLM website has many examples including some detailed ones with screen shots on how to create an MDM file using SPSS input file.
2. Creating MDM from a single SPSS data file
One improvement that HLM 6 offers is that HLM 6.x allows the use of a single data file containing both the level-1 and level-2 variables. The single data set should be sorted by the level-2 id variable and the steps are basically the same as the steps for using level-1 and level-2 data files, except the same data file is used twice, once for level-1 and once for level-2. HLM will figure out that it has to aggregate the single data file to get the level-2 variables. If the single file is huge, it might be more efficient use the two-file approach.

For level-1, we choose these variables:

For level-2, we choose these variables:

The last steps consist of a couple of clicks: Make MDM => Check Stats => Done.
3. Creating MDM from a level-1 and a level-2 data files in SAS format
Let's say that we have the HS&B file in SAS sas7bdat format, hsb1.sas7bdat and hsb2.sas7bdat. We can follow a similar routine to import the data files. HLM uses DBMSCOPY to import data files of different formats. For example, to import files in .sas7bdat format, the first thing to do is to set the type of data to other non-ASCII data via the File then Preferences pull-down menu.

Following similar steps as described in the example of import SPSS files and also by choosing the right data file type when we "Browse" to choose, we will get to the following window:

The rest of the routine is fairly straightforward and we will demonstrate during the seminar and skip the minute details here.
4. What files have been created?
Let's now go back to the approach of using a single SPSS input file and find out what files have been created and how to use them in the future. Here is the list of files that are created during the process of creating the MDM file:

The MDM file test.mdm can be opened directly in HLM for analyses. What needs to point out is the template file. The template file test.mdmt is an ASCII file and here what it contains:
#HLM2 MDM CREATION TEMPLATE growthmodel:n rawdattype:spss l1fname:C:\Data\for_hlm.sav l2fname:C:\Data\for_hlm.sav l1missing:n timeofdeletion:now mdmname:test.mdm *begin l1vars level2id:ID MINORITY FEMALE SES MATHACH *end l1vars *begin l2vars level2id:ID SECTOR MEANSES *end l2vars
If we just want to add a few new variables from the original data file, we can open this template file from within HLM or edit the template file directly.
The .STS file contains the descriptive statistics and is useful in checking if the data file used in creating the MDM file is what we think it is.
LEVEL-1 DESCRIPTIVE STATISTICS
VARIABLE NAME N MEAN SD MINIMUM MAXIMUM
MINORITY 7185 0.27 0.45 0.00 1.00
FEMALE 7185 0.53 0.50 0.00 1.00
SES 7185 0.00 0.78 -3.76 2.69
MATHACH 7185 12.75 6.88 -2.83 24.99
LEVEL-2 DESCRIPTIVE STATISTICS
VARIABLE NAME N MEAN SD MINIMUM MAXIMUM
SECTOR 160 0.44 0.50 0.00 1.00
MEANSES 160 -0.00 0.41 -1.19 0.83
HLM offers some really nice data-based graphs. It is always a good idea to plot our data before constructing our models.

1. Box-whisker plot


2. Scatter plot


Model 1: Unconditional Means Model
This model is referred as a one-way random effect ANOVA and is the
simplest possible random effect linear model. The motivation for this model is
the question on how much schools vary in their mean mathematics
achievement. In terms of equations, we have the following, where rij
~ N(0, σ2) and u0j ~ N(0, τ2),
MATHACHij = β0j + rij
β0j = γ00 + u0j


The data source for this run = C:\Data\test.mdm The command file for this run = whlmtemp.hlm Output file name = C:\Data\hlm2.txt The maximum number of level-1 units = 7185 The maximum number of level-2 units = 160 The maximum number of iterations = 100 Method of estimation: restricted maximum likelihood
Weighting Specification
-----------------------
Weight
Variable
Weighting? Name Normalized?
Level 1 no
Level 2 no
Precision no
The outcome variable is MATHACH
The model specified for the fixed effects was: ----------------------------------------------------
Level-1 Level-2
Coefficients Predictors
---------------------- ---------------
INTRCPT1, B0 INTRCPT2, G00
The model specified for the covariance components was: ---------------------------------------------------------
Sigma squared (constant across level-2 units)
Tau dimensions
INTRCPT1
Summary of the model specified (in equation format) ---------------------------------------------------
Level-1 Model
Y = B0 + R
Level-2 Model B0 = G00 + U0
Iterations stopped due to small change in likelihood function ******* ITERATION 4 *******
Sigma_squared = 39.14831
Tau INTRCPT1,B0 8.61431
Tau (as correlations) INTRCPT1,B0 1.000
---------------------------------------------------- Random level-1 coefficient Reliability estimate ---------------------------------------------------- INTRCPT1, B0 0.901 ----------------------------------------------------
The value of the likelihood function at iteration 4 = -2.355840E+004 The outcome variable is MATHACH
Final estimation of fixed effects:
----------------------------------------------------------------------------
Standard Approx.
Fixed Effect Coefficient Error T-ratio d.f. P-value
----------------------------------------------------------------------------
For INTRCPT1, B0
INTRCPT2, G00 12.636972 0.244412 51.704 159 0.000
----------------------------------------------------------------------------
The outcome variable is MATHACH
Final estimation of fixed effects
(with robust standard errors)
----------------------------------------------------------------------------
Standard Approx.
Fixed Effect Coefficient Error T-ratio d.f. P-value
----------------------------------------------------------------------------
For INTRCPT1, B0
INTRCPT2, G00 12.636972 0.243628 51.870 159 0.000
----------------------------------------------------------------------------
Final estimation of variance components:
-----------------------------------------------------------------------------
Random Effect Standard Variance df Chi-square P-value
Deviation Component
-----------------------------------------------------------------------------
INTRCPT1, U0 2.93501 8.61431 159 1660.23259 0.000
level-1, R 6.25686 39.14831
-----------------------------------------------------------------------------
Statistics for current covariance components model -------------------------------------------------- Deviance = 47116.793477 Number of estimated parameters = 2
Notes:
Model 2: Including Effects of School Level (level 2) Predictors -- predicting mathach from meanses
This model is referred as regression with Means-as-Outcomes by Raudenbush and
Bryk. The motivation of this model is the question on if the schools with high
MEANSES also have high math achievement. In other words, we want to
understand why there is a school difference on mathematics achievement. In terms
of regression equations, we have the following.
MATHACHij = β0j + rij
β0j = γ00 + γ01(MEANSES)
+ u0j


Final estimation of fixed effects:
----------------------------------------------------------------------------
Standard Approx.
Fixed Effect Coefficient Error T-ratio d.f. P-value
----------------------------------------------------------------------------
For INTRCPT1, B0
INTRCPT2, G00 12.649436 0.149280 84.736 158 0.000
MEANSES, G01 5.863538 0.361457 16.222 158 0.000
----------------------------------------------------------------------------
The outcome variable is MATHACH
Final estimation of fixed effects
(with robust standard errors)
----------------------------------------------------------------------------
Standard Approx.
Fixed Effect Coefficient Error T-ratio d.f. P-value
----------------------------------------------------------------------------
For INTRCPT1, B0
INTRCPT2, G00 12.649436 0.148377 85.252 158 0.000
MEANSES, G01 5.863538 0.320211 18.311 158 0.000
----------------------------------------------------------------------------
Final estimation of variance components:
-----------------------------------------------------------------------------
Random Effect Standard Variance df Chi-square P-value
Deviation Component
-----------------------------------------------------------------------------
INTRCPT1, U0 1.62441 2.63870 158 633.51744 0.000
level-1, R 6.25756 39.15708
-----------------------------------------------------------------------------
Statistics for current covariance components model -------------------------------------------------- Deviance = 46959.446959 Number of estimated parameters = 2
Notes:
Filling in the parameter
estimates we get
MATHACHij = β0j + rij
β0j = 12.65 +5.86(MEANSES)
+ u0j
V(rij) = 39.16
V(u0j) = 2.64
Model 3: Including Effects of Student-Level Predictors--predicting mathach from student-level ses
This model is referred as a random-coefficient model by Raudenbush and Bryk. Pretend that we run regression of mathach on ses on each school, that is we are going to run 160 regressions.
These are some of the questions that motivates the following model.
MATHACHij = β0j + β1j
SES + rij
β0j = γ00 + u0j
β1j = γ10 + u1j


Sigma_squared = 36.82835
Tau
INTRCPT1,B0 4.82978 -0.15399
SES,B1 -0.15399 0.41828
Tau (as correlations)
INTRCPT1,B0 1.000 -0.108
SES,B1 -0.108 1.000
----------------------------------------------------
Random level-1 coefficient Reliability estimate
----------------------------------------------------
INTRCPT1, B0 0.797
SES, B1 0.179
----------------------------------------------------
The value of the likelihood function at iteration 21 = -2.331928E+004
The outcome variable is MATHACH
Final estimation of fixed effects:
----------------------------------------------------------------------------
Standard Approx.
Fixed Effect Coefficient Error T-ratio d.f. P-value
----------------------------------------------------------------------------
For INTRCPT1, B0
INTRCPT2, G00 12.664935 0.189874 66.702 159 0.000
For SES slope, B1
INTRCPT2, G10 2.393878 0.118278 20.240 159 0.000
----------------------------------------------------------------------------
The outcome variable is MATHACH
Final estimation of fixed effects
(with robust standard errors)
----------------------------------------------------------------------------
Standard Approx.
Fixed Effect Coefficient Error T-ratio d.f. P-value
----------------------------------------------------------------------------
For INTRCPT1, B0
INTRCPT2, G00 12.664935 0.189251 66.921 159 0.000
For SES slope, B1
INTRCPT2, G10 2.393878 0.117697 20.339 159 0.000
----------------------------------------------------------------------------
Final estimation of variance components:
-----------------------------------------------------------------------------
Random Effect Standard Variance df Chi-square P-value
Deviation Component
-----------------------------------------------------------------------------
INTRCPT1, U0 2.19768 4.82978 159 905.26472 0.000
SES slope, U1 0.64675 0.41828 159 216.21178 0.002
level-1, R 6.06864 36.82835
-----------------------------------------------------------------------------
Statistics for current covariance components model
------------------------------------------------
Deviance = 46638.560929
Number of estimated parameters = 4
Notes:
Filling in the parameter
estimates we get
MATHACHij = β0j + β1j
(SES) + rij
β0j = 12.66 + u0j
β1j = 2.39 + u1j
V(rij) = 36.82
V(u0j) = 4.83
V(u1j) = .42
Model 4: Including Both Level-1 and Level-2 Predictors --predicting mathach from meanses, schtype, group-centered ses and the cross level interaction of meanses and schtype with group-centered ses.
This model is referred as an intercepts and slopes-as-outcomes model by Raudenbush and Bryk. We have examined the variability of the regression equations across schools. Now we are ready to build our final model based on our theory and our preliminary analyses.
MATHACHij = β0j + β1j
(SES - MEANSES) + rij
β0j = γ00 + γ01(SCHTYPE)
+ γ02(MEANSES) + u0j
β1j = γ10 + γ11(SCHTYPE) + γ12(MEANSES)
+ u1j


Sigma_squared = 36.70313
Tau
INTRCPT1,B0 2.37996 0.19058
SES,B1 0.19058 0.14892
Tau (as correlations)
INTRCPT1,B0 1.000 0.320
SES,B1 0.320 1.000
----------------------------------------------------
Random level-1 coefficient Reliability estimate
----------------------------------------------------
INTRCPT1, B0 0.733
SES, B1 0.073
----------------------------------------------------
The value of the likelihood function at iteration 61 = -2.325094E+004
The outcome variable is MATHACH
Final estimation of fixed effects:
----------------------------------------------------------------------------
Standard Approx.
Fixed Effect Coefficient Error T-ratio d.f. P-value
----------------------------------------------------------------------------
For INTRCPT1, B0
INTRCPT2, G00 12.096006 0.198734 60.865 157 0.000
SCHTYPE, G01 1.226384 0.306272 4.004 157 0.000
MEANSES, G02 5.333056 0.369161 14.446 157 0.000
For SES slope, B1
INTRCPT2, G10 2.937981 0.157135 18.697 157 0.000
SCHTYPE, G11 -1.640954 0.242905 -6.756 157 0.000
MEANSES, G12 1.034427 0.302566 3.419 157 0.001
----------------------------------------------------------------------------
The outcome variable is MATHACH
Final estimation of fixed effects
(with robust standard errors)
----------------------------------------------------------------------------
Standard Approx.
Fixed Effect Coefficient Error T-ratio d.f. P-value
----------------------------------------------------------------------------
For INTRCPT1, B0
INTRCPT2, G00 12.096006 0.173699 69.638 157 0.000
SCHTYPE, G01 1.226384 0.308484 3.976 157 0.000
MEANSES, G02 5.333056 0.334600 15.939 157 0.000
For SES slope, B1
INTRCPT2, G10 2.937981 0.147620 19.902 157 0.000
SCHTYPE, G11 -1.640954 0.237401 -6.912 157 0.000
MEANSES, G12 1.034427 0.332785 3.108 157 0.003
----------------------------------------------------------------------------
Final estimation of variance components:
-----------------------------------------------------------------------------
Random Effect Standard Variance df Chi-square P-value
Deviation Component
-----------------------------------------------------------------------------
INTRCPT1, U0 1.54271 2.37996 157 605.29503 0.000
SES slope, U1 0.38590 0.14892 157 162.30867 0.369
level-1, R 6.05831 36.70313
-----------------------------------------------------------------------------
Statistics for current covariance components model
--------------------------------------------------
Deviance = 46501.875643
Number of estimated parameters = 4
Notes:
Filling in the parameter
estimates we get
MATHACHij = β0j + β1j
(SES - MEANSES) + rij
β0j = 12.10 +
1.22(SCHTYPE) + 5.33(MEANSES) + u0j
β1j = 2.94 + -1.64(SCHTYPE) + 1.03(MEANSES)
+ u1j
V(rij) = 36.7
V(u0j) = 2.37
V(u1j) = .15
1. Multivariate Hypothesis Tests on Fixed Effects

We will test the effect of schtype on the intercept and on the slope of ses simultaneously. This will be a test of two degrees of freedom.

Click on the box labeled "1" and then fill out the boxes below to indicate we wish to test jointly that γ01 = 0 and γ11 = 0 .

Results of General Linear Hypothesis Testing ----------------------------------------------------------------------------- Coefficients Contrast ----------------------------------------------------------------------------- For INTRCPT1, B0 INTRCPT2, G00 12.096006 0.000 0.000 SCHTYPE, G01 1.226384 1.000 0.000 MEANSES, G02 5.333056 0.000 0.000 For SES slope, B1 INTRCPT2, G10 2.937981 0.000 0.000 SCHTYPE, G11 -1.640954 0.000 2.000 MEANSES, G12 1.034427 0.000 0.000 Chi-square statistic = 60.596880 Degrees of freedom = 2 P-value = 0.0000002. Multivariate Tests of Variance-Covariance Components Specification
From Model 4 that we ran before, we saw that the variance for the slope of group-centered ses is not very large and its p-value is not statistically significant. This suggests that we may not want to model the group-centered ses as a random effect. A simpler model will be that the slope of variable ses varies non-randomly on level-2 variables schtype and meanses. We may want to compare these two models to decide if the simpler model is just about as good as the previous one.
To compare two models, we will have to obtain the deviance (which is just -2*log likelihood) for the first model and enter it to the Hypothesis Testing before running the second model.

Final estimation of variance components:
-----------------------------------------------------------------------------
Random Effect Standard Variance df Chi-square P-value
Deviation Component
-----------------------------------------------------------------------------
INTRCPT1, U0 1.54118 2.37524 157 604.29895 0.000
level-1, R 6.06351 36.76611
-----------------------------------------------------------------------------
Statistics for current covariance components model -------------------------------------------------- Deviance = 46502.952743 Number of estimated parameters = 2


Statistics for current covariance components model -------------------------------------------------- Deviance = 46501.875643 Number of estimated parameters = 4
Variance-Covariance components test ----------------------------------- Chi-square statistic = 1.07710 Number of degrees of freedom = 2 P-value = >.500
3. Model-based Graphs
HLM 6 offers many model-based graphs. The graphs below are based on the following model.


Level 1 equation Graphing:


Level-2 EB/OLS coefficient confidence intervals


1. Modeling Heterogeneity of Level-1 Variances
Sometimes, the level-1 variance might be heterogeneous. For example, we may expect that female students and male students have different variances. Thus, we want to model the level-1 variance to be a function of variable female.

From pull-down menu Other Settings => Estimation Settings => Heterogeneous .sigma^2. We then have a choice on which variable(s) to choose to model the heterogeneity. Here we picked the variable female.

RESULTS FOR HETEROGENEOUS SIGMA-SQUARED
(macro iteration 4)
Var(R) = Sigma_squared and
log(Sigma_squared) = alpha0 + alpha1(FEMALE)
Model for level-1 variance
--------------------------------------------------------------------
Standard
Parameter Coefficient Error Z-ratio P-value
--------------------------------------------------------------------
INTRCPT1 ,alpha0 3.66570 0.024718 148.301 0.000
FEMALE ,alpha1 -0.12106 0.033936 -3.567 0.001
--------------------------------------------------------------------
Summary of Model Fit
-------------------------------------------------------------------
Model Number of Deviance
Parameters
-------------------------------------------------------------------
1. Homogeneous sigma_squared 10 46494.59261
2. Heterogeneous sigma_squared 11 46482.09334
-------------------------------------------------------------------
Model Comparison Chi-square df P-value
-------------------------------------------------------------------
Model 1 vs Model 2 12.49926 1 0.001
2. Models Without a Level-1 Intercept
Sometimes, we may want to exclude the intercept from our model. For example, we may have a level-1 categorical variable and we want to include all the categories of this variable in the model. To this end, we have to exclude the intercept, otherwise our model will be over-parameterized. To this end, we are going to create another binary variable for male (=1-female). As we have mentioned before, since HLM does not have any data management facility, we have to create this variable outside HLM. We chose SPSS for this task and modified the template file created earlier to create a new MDM file.




The outcome variable is MATHACH
Final estimation of fixed effects
(with robust standard errors)
----------------------------------------------------------------------------
Standard Approx.
Fixed Effect Coefficient Error T-ratio d.f. P-value
----------------------------------------------------------------------------
For FEMALE slope, B1
INTRCPT2, G10 10.684432 0.298122 35.839 158 0.000
SCHTYPE, G11 2.932540 0.446512 6.568 158 0.000
For MALE slope, B2
INTRCPT2, G20 12.174859 0.322616 37.738 158 0.000
SCHTYPE, G21 2.597771 0.487027 5.334 158 0.000
----------------------------------------------------------------------------
Final estimation of variance components:
-----------------------------------------------------------------------------
Random Effect Standard Variance df Chi-square P-value
Deviation Component
-----------------------------------------------------------------------------
FEMALE slope, U1 2.41260 5.82064 121 481.99916 0.000
MALE slope, U2 2.64370 6.98917 121 483.25462 0.000
level-1, R 6.22438 38.74285
-----------------------------------------------------------------------------
3. Constraints on Fixed Effects
Let's say that we believe that the effect of schtype is the same on both female and male. We need to impose the constraint γ11 = γ21.



The outcome variable is MATHACH
Final estimation of fixed effects
(with robust standard errors)
----------------------------------------------------------------------------
Standard Approx.
Fixed Effect Coefficient Error T-ratio d.f. P-value
----------------------------------------------------------------------------
For FEMALE slope, B1
INTRCPT2, G10 10.723664 0.295717 36.263 158 0.000
SCHTYPE, G11 * 2.804823 0.417646 6.716 158 0.000
For MALE slope, B2
INTRCPT2, G20 12.103608 0.313462 38.613 159 0.000
----------------------------------------------------------------------------
The "*" gammas have been constrained. See the table on the header page.
Final estimation of variance components:
-----------------------------------------------------------------------------
Random Effect Standard Variance df Chi-square P-value
Deviation Component
-----------------------------------------------------------------------------
FEMALE slope, U1 2.40847 5.80071 121 484.11557 0.000
MALE slope, U2 2.63048 6.91943 121 483.35444 0.000
level-1, R 6.22449 38.74426
-----------------------------------------------------------------------------
HLM has some very nice features for multilevel data analysis.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services