UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

Regression with Stata
Chapter 3: Self Assessment Answers

1. Using the elemapi2 data file ( use http://www.ats.ucla.edu/stat/stata/examples/ara/elemapi2 ) convert the variable ell into 2 categories using the following coding, 0-25 on ell becomes 0, and 26-100 on ell becomes 1. Use this recoded version of ell to predict api00 and interpret the results.

Answer 1.
We first use the elemapi2 data file

We convert ell into a 0/1 variable called ell_bin.

We tabulate ell_bin to see that the recoding looks OK.

We now include ell_bin in the regression model.

The coefficient for _cons represents the api scores for the schools where ell_bin is coded 0 (low number of English language learners).   The coefficient for ell_bin represents the api scores for the schools with a high number of English language learners minus the api scores for the api scores for the schools with a low number of English language learners. When broken into these two categories, the schools with the high number of English language learners score 207 points lower on the api scores than schools with a low number of English language learners.

2. Convert the variable ell into 3 categories coding those scoring 0-14 on ell as 1, and those 15/41 as 1 and 42/100 as 3. Do an analysis predicting api00 from the ell variable converted to a 1/2/3 variable. Interpret the results.

Answer 2.
First we create the categorical variable called ell_cat.

We check the creation of ell_cat using the tabulate command below.

tabulate ell_cat

    ell_cat |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        136       34.00       34.00
          2 |        129       32.25       66.25
          3 |        135       33.75      100.00
------------+-----------------------------------
      Total |        400      100.00

We use xi with the regress command to perform this analysis, and this creates two dummy codes with category 1 (low number of English language learners) as the reference category.

The _cons represents the mean for the reference category, when ell_cat is coded 1.  The coefficient for _Iell_cat_2 is the difference in the mean api score between the ell_cat=2 group and the reference group, ell_cat=1, and this difference is significant. The schools with a middle amount of English language learners score 141 points lower on their api score as compared to the schools with low amounts of English language learners. The coefficient for _Iell_cat_3 is the difference in the api scores for the ell_cat=3 group and the reference group, and this is significant as well.  The schools with high amounts of English language learners score about 257 points lower than schools with low amounts of English language learners.

3. Do a regression analysis predicting api00 from yr_rnd and the ell variable converted to a 0/1 variable. Then create an interaction term and run the analysis again. Interpret the results of these analyses.

Answer 3.
We use the regress command to perform this analysis below.

These results indicate that  year round schools (yr_rnd=1) score about 77 points lower on the api test than non-year round schools (yr_rnd=0).   Also, schools with high numbers of English language learners score about 182 points lower on the api test than the schools with low numbers of English language learners.   Both of these effects are significant.

Now we include an interaction term in the analysis.

The main effects of yr_rnd and ell_bin are still significant, but the interaction term yr_ell is not significant.   This suggests that the effects we described in the analysis above are consistent across the levels of yr_rnd and ell_bin.  In other words, we can say that the effect of ell_bin is much the same for the year round schools as for the non-year round schools. 

We could also have run this analysis using the anova command, which can be much more convenient for models like these.

And we can use the adjust command to get the means for the cells.   You can relate the coefficients from the regression model to the means below.   For example, the _cons is the mean for the cell where all the variables are 0, and so forth.

4. Do a regression analisys predicting api00 from ell coded as 0/1 (from question 1) and some_col, and the interaction of these two variables. Interpret the results, including showing a graph of the results.

Answer 4.
Create an interaction and run the analysis

Make a graph to help in the interpretation

The graph helps us visually understand the interaction represented by ell_col.  We can see that the regression lines between some_col and api00 are not parallel -- specifically, the line for the schools with a low number of English language learners has a downward slope, and the line for the schools with a large number of English language learners has an upward slope.  From the regression equation, we see that the slope of the line when ell_bin is 0 (low number of English language learners) is -1.44.  This corresponds to the solid regression line we see in the above graph. The difference between the slopes for the schools with a high number of English language learners and the schools with a low number of English language learners is 4.62.  In order to get the slopes for the schools with a high number of English language learners we would add 4.62 to -1.44 and that yields 3.18, so this is the slope for the line for the schools with the high number of English language learners.  This corresponds to the dotted regression line that we see in the above graph.

5. Use the variable ell converted into 3 categories (from question 2) and predict api00 from ell in 3 categories, from some_col and the interaction. of these two variables. Interpret the results, including showing a graph.

We use the xi command with regress to perform the analysis looking at the effect of some_col and ell_cat and the interaction.

To help interpretation, lets make a graph of the predicted values.

we can use the information in the graph and in the regression equation to help interpret these results.  First looking at the graph, we see that the slopes of the three regression lines are not parallel.  For the schools with a low number of English language learners (when ell_cat is 1) the regression line has a downward slope, for the schools with a middle number of English language learners (when ell_cat is 2) the regression line is pretty flat, and for the schools with a high number of English language learners (when ell_cat is 3) the regression line has an upward tilt. we can use the regression model to compute the exact slopes of all three of these regression lines.  Since group 1  is the reference category the slope for that regression line is the slope for some_col, which is -2.05. 

The coefficient for _IellXsome~2 (2.48) tells us how much we need to add to -2.05 to get the coefficient for the second group.  when we add -2.05 to 2.48 we get .43, the slope for the second group.  Because the coefficient _IellXsome~2 is significant we can say that the coefficient for group 1 is significantly different from group 2.

The coefficient for _IellXsome~3 (5.11) tells us how much we need to add to -2.05 to get the coefficient for the third group.  when we add -2.05 to 5.11 we get 3.06, the slope for the third group.  Because the coefficient _IellXsome~3 is significant we can say that the coefficient for group 1 is significantly different from group 3.


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.