3.1 Regression with a 0/1 variable
The simplest example of a categorical predictor in a regression analysis is a 0/1
variable, also called a dummy variable. Let's use the variable yr_rnd as
an example of a dummy variable. We can include a dummy variable as a predictor in a
regression analysis as shown below.
GET FILE='C:\spssreg\elemapi2.sav'.
regression
/dep api00
/method = enter yr_rnd.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
year round school(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.475(a) |
.226 |
.224 |
125.300 |
| a Predictors: (Constant), year round school
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
1825000.563 |
1 |
1825000.563 |
116.241 |
.000(a) |
| Residual |
6248671.435 |
398 |
15700.179 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), year round school |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
684.539 |
7.140 |
|
95.878 |
.000 |
| year round school |
-160.506 |
14.887 |
-.475 |
-10.782 |
.000 |
| a Dependent Variable: api 2000
|
This may seem odd at first, but this is a legitimate analysis. But what does this mean?
Let's go back to basics and write out the regression equation that this model implies.
api00 = constant + Byr_rnd * yr_rnd
where constant is the intercept and we use Byr_rnd
to represent the coefficient for variable yr_rnd. Filling in the
values from the regression equation, we get
api00 = 684.539 + -160.5064 * yr_rnd
If a student is not in year-round school (i.e., yr_rnd is 0) the
regression equation would simplify to
api00 = constant + 0 * Byr_rnd
api00 = 684.539 + 0 * -160.5064
api00 = 684.539
If a student is year-round school, the regression equation would simplify to
api00 = constant + 1 * Byr_rnd
api00 = 684.539 + 1 * -160.5064
api00 = 524.0326
We can graph the observed values and the predicted values using the igraph
command as shown below. Although yr_rnd only has 2 values, we can still
draw a regression line showing the relationship between yr_rnd and api00.
Based on the results above, we see that the predicted value for non-year round
schools is 684.539 and the predicted value for the year round schools is
524.032, and the slope of the line is negative, which makes sense since the
coefficient for yr_rnd was negative (-160.5064). Note that the "type = scale"
option is needed here because yr_rnd is an ordinal variable in the
dataset.
IGRAPH
/X1 = VAR(yr_rnd) TYPE = scale
/Y = VAR (api00) TYPE = SCALE
/FITLINE METHOD = REGRESSION LINEAR LINE = TOTAL MEFFECT
/CATORDER VAR(yr_rnd) (ASCENDING VALUES OMITEMPTY)
/SCATTER COINCIDENT = NONE.

Let's compare these predicted values to the mean api00 scores for the
year-round and non-year-round students.
MEANS
TABLES=api00 BY yr_rnd.
Case Processing Summary
|
Cases |
| Included |
Excluded |
Total |
| N |
Percent |
N |
Percent |
N |
Percent |
| api 2000 * year round school |
400 |
100.0% |
0 |
.0% |
400 |
100.0% |
Report
api 2000
| year round school |
Mean |
N |
Std. Deviation |
| No |
684.54 |
308 |
132.113 |
| Yes |
524.03 |
92 |
98.916 |
| Total |
647.62 |
400 |
142.249 |
As you see, the regression equation predicts that the value of api00
will be the mean value of your group, depending on whether you went to year round school
or non-year round school.
Let's relate these predicted values back to the regression equation. For the
non-year-round students, their mean is the same as the intercept (684.539). The
coefficient for yr_rnd is the amount we need to add to get the mean for
the year-round students, i.e., we need to add -160.5064 to get 524.0326, the mean for the
non year-round students. In other words, Byr_rnd is the mean api00
score for the year-round students minus the mean api00 score for the non year-round
students, i.e., mean(year-round) - mean(non year-round).
It may be surprising to note that this regression analysis with a single dummy variable
is the same as doing a t-test comparing the mean api00 for the year-round
students with the non year-round students (see below). You can see that the t-value below
is the same as the t-value for yr_rnd in the regression above. This is because
Byr_rnd compares the non year-rounds and non year-rounds (since
the coefficient is mean(year round)-mean(non year-round)).
T-TEST
GROUPS=yr_rnd(0 1)
/VARIABLES=api00.
Group Statistics
|
year round school |
N |
Mean |
Std. Deviation |
Std. Error Mean |
| api 2000 |
No |
308 |
684.54 |
132.113 |
7.528 |
| Yes |
92 |
524.03 |
98.916 |
10.313 |
Independent Samples Test
|
Levene's Test for Equality of Variances |
t-test for Equality of Means |
| F |
Sig. |
t |
df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
| Lower |
Upper |
| api 2000 |
Equal variances assumed |
20.539 |
.000 |
10.782 |
398 |
.000 |
160.51 |
14.887 |
131.239 |
189.774 |
| Equal variances not assumed |
|
|
12.571 |
197.215 |
.000 |
160.51 |
12.768 |
135.327 |
185.686 |
Since a t-test is the same as doing an ANOVA, we can get the same results using the anova
command as well. Note that in SPSS, when you click on "analyze"
and "compare means," you can select a one-way ANOVA test. The
code for conducting a one-way ANOVA is shown below. After this analysis, however,
we will use the glm (for general linear model) command instead of the oneway
command.
ONEWAY
api00 BY yr_rnd.
ANOVA
api 2000
|
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| Between Groups |
1825000.563 |
1 |
1825000.563 |
116.241 |
.000 |
| Within Groups |
6248671.435 |
398 |
15700.179 |
|
|
| Total |
8073671.998 |
399 |
|
|
|
Remember that if you square the t-value, you will get the
F-value: 10.7815**2 = 116.24074 , showing another way in which the t-test
is the same as the ANOVA test.
3.2 Regression with a 1/2 variable
A categorical predictor variable does not have to be coded 0/1 to be used in a
regression model. It is easier to understand and interpret the results from a model with
dummy variables, but the results from a variable coded 1/2 yield essentially the same
results.
Let's make a copy of the variable yr_rnd called yr_rnd2
that is coded 1/2, 1=non year-round and 2=year-round.
compute yr_rnd2 = yr_rnd.
recode yr_rnd2 (0=1) (1=2).
execute.
REGRESSION
/DEPENDENT api00
/METHOD=ENTER yr_rnd2.
<some output omitted to save space>
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
845.045 |
19.353 |
|
43.664 |
.000 |
| YR_RND2 |
-160.506 |
14.887 |
-.475 |
-10.782 |
.000 |
| a Dependent Variable: api 2000
|
Note that the coefficient for yr_rnd is the same as yr_rnd2.
So, you can see that if you code yr_rnd as 0/1 or as 1/2, the regression
coefficient works out to be the same. However the intercept is a bit less
intuitive. When we used yr_rnd, the intercept was the mean for the non
year-rounds. When using yr_rnd2, the intercept is the mean for the non
year-rounds minus Byr_rnd2, i.e., 684.539 - (-160.506) = 845.045
Note that you can use 0/1 or 1/2 coding and the results for the coefficient come out
the same, but the interpretation of constant in the regression equation is different. It
is often easier to interpret the estimates for 0/1 coding.
In summary, these results indicate that the api00 scores are
significantly different for the students depending on the type of school they attend, year
round school vs. non-year round school. Those who attend non-year round school have
significantly higher scores. Based on the regression results, those who attend non-year
round schools have scores that are 160.5 points higher than those who attend year-round
schools.
3.3 Regression with a 1/2/3 variable
3.3.1 Manually Creating Dummy Variables
Say that we would like to examine the relationship between the amount of poverty and
api scores. We don't have a measure of poverty, but we can use mealcat as
a proxy for a measure of poverty. You might be tempted to try including mealcat in a regression like this.
regression
/dependent api00
/method=enter mealcat.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
Percentage free meals in 3 categories(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.867(a) |
.752 |
.752 |
70.908 |
| a Predictors: (Constant), Percentage free meals in 3 categories
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
6072527.519 |
1 |
6072527.519 |
1207.742 |
.000(a) |
| Residual |
2001144.479 |
398 |
5028.001 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), Percentage free meals in 3 categories |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
950.987 |
9.422 |
|
100.935 |
.000 |
| Percentage free meals in 3 categories |
-150.553 |
4.332 |
-.867 |
-34.753 |
.000 |
| a Dependent Variable: api 2000
|
This is looking at the linear effect of mealcat with api00,
but mealcat is not an interval variable. Instead, you will want to code the variable so
that all the information concerning the three levels is accounted for.
You can dummy code mealcat like this.
if mealcat ~= missing(mealcat) mealcat1 = 0.
if mealcat = 1 mealcat1 = 1.
if mealcat ~= missing(mealcat) mealcat2 = 0.
if mealcat = 2 mealcat2 = 1.
if mealcat ~= missing(mealcat) mealcat3 = 0.
if mealcat = 3 mealcat3 = 1.
execute.
We now have created mealcat1 that is 1 if mealcat is
1, and 0 otherwise. Likewise, mealcat2 is 1 if mealcat
is 2, and 0 otherwise; and likewise mealcat3 was created. We can see this
below.
list mealcat mealcat1 mealcat2 mealcat3
/cases from 1 to 10.
MEALCAT MEALCAT1 MEALCAT2 MEALCAT3
2 .00 1.00 .00
3 .00 .00 1.00
3 .00 .00 1.00
3 .00 .00 1.00
3 .00 .00 1.00
1 1.00 .00 .00
1 1.00 .00 .00
1 1.00 .00 .00
1 1.00 .00 .00
1 1.00 .00 .00
Number of cases read: 10 Number of cases listed: 10
We can now use two of these dummy variables (mealcat2 and mealcat3)
in the regression analysis.
regression
/dependent api00
/method = enter mealcat2 mealcat3.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
MEALCAT3, MEALCAT2(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.869(a) |
.755 |
.754 |
70.612 |
| a Predictors: (Constant), MEALCAT3, MEALCAT2
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000(a) |
| Residual |
1979474.328 |
397 |
4986.081 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), MEALCAT3, MEALCAT2 |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
805.718 |
6.169 |
|
130.599 |
.000 |
| MEALCAT2 |
-166.324 |
8.708 |
-.550 |
-19.099 |
.000 |
| MEALCAT3 |
-301.338 |
8.629 |
-1.007 |
-34.922 |
.000 |
| a Dependent Variable: api 2000
|
We can test the overall differences among the three groups by using the /method
= test statement as
shown below. This shows that the overall differences among the three groups are
significant, with an F value of 611.121 and a p value of .000.
regression
/dependent api00
/method = test (mealcat2 mealcat3).
Variables Entered/Removed(a)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
MEALCAT3, MEALCAT2 |
. |
Test |
| a Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.869(a) |
.755 |
.754 |
70.612 |
| a Predictors: (Constant), MEALCAT3, MEALCAT2
|
ANOVA(c)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
R Square Change |
| 1 |
Subset Tests |
MEALCAT2, MEALCAT3 |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000(a) |
.755 |
| Regression |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000(b) |
|
| Residual |
1979474.328 |
397 |
4986.081 |
|
|
|
| Total |
8073671.997 |
399 |
|
|
|
|
| a Tested against the full model. |
| b Predictors in the Full Model: (Constant), MEALCAT3, MEALCAT2. |
| c Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
805.718 |
6.169 |
|
130.599 |
.000 |
| MEALCAT2 |
-166.324 |
8.708 |
-.550 |
-19.099 |
.000 |
| MEALCAT3 |
-301.338 |
8.629 |
-1.007 |
-34.922 |
.000 |
| a Dependent Variable: api 2000
|
The interpretation of the coefficients is much like that for the binary variables. Group 1 is
the omitted group, so the constant is the mean for group 1. The coefficient for mealcat2
is the mean for group 2 minus the mean of the omitted group (group 1), and the coefficient for
mealcat3
is the mean of group 3 minus the mean of group 1. You can verify this by comparing the
coefficients with the means of the groups, shown below.
MEANS
TABLES=api00 BY mealcat.
Case Processing Summary
|
Cases |
| Included |
Excluded |
Total |
| N |
Percent |
N |
Percent |
N |
Percent |
| api 2000 * Percentage free meals in 3 categories |
400 |
100.0% |
0 |
.0% |
400 |
100.0% |
Report
api 2000
| Percentage free meals in 3 categories |
Mean |
N |
Std. Deviation |
| 0-46% free meals |
805.72 |
131 |
65.669 |
| 47-80% free meals |
639.39 |
132 |
82.135 |
| 81-100% free meals |
504.38 |
137 |
62.727 |
| Total |
647.62 |
400 |
142.249 |
Based on these results, we can say that the three groups differ in their api00
scores, and that in particular group2 is significantly different from group1 (because mealcat2
was significant) and group 3 is significantly different from group 1 (because mealcat3
was significant).
3.3.2 Using Do Loops
We can use the do repeat command to do the work for us to create the indicator
(dummy) variables. This method is particularly useful when you need to create many
indicator variables.
DO REPEAT A=mealcat1 mealcat2 mealcat3
/B=1 2 3.
COMPUTE A=(mealcat=B).
END REPEAT.
We will then do a crosstab to verify that our indicator variables were created
correctly.
crosstab /tables = mealcat by mealcat1
/tables = mealcat by mealcat2
/tables = mealcat by mealcat3.
Case Processing Summary
|
Cases |
| Valid |
Missing |
Total |
| N |
Percent |
N |
Percent |
N |
Percent |
| Percentage free meals in 3 categories * MEALCAT1 |
400 |
100.0% |
0 |
.0% |
400 |
100.0% |
| Percentage free meals in 3 categories * MEALCAT2 |
400 |
100.0% |
0 |
.0% |
400 |
100.0% |
| Percentage free meals in 3 categories * MEALCAT3 |
400 |
100.0% |
0 |
.0% |
400 |
100.0% |
Percentage free meals in 3 categories * MEALCAT1 Crosstabulation
Count
|
MEALCAT1 |
Total |
| .00 |
1.00 |
| Percentage free meals in 3 categories |
0-46% free meals |
|
131 |
131 |
| 47-80% free meals |
132 |
|
132 |
| 81-100% free meals |
137 |
|
137 |
| Total |
269 |
131 |
400 |
Percentage free meals in 3 categories * MEALCAT2 Crosstabulation
Count
|
MEALCAT2 |
Total |
| .00 |
1.00 |
| Percentage free meals in 3 categories |
0-46% free meals |
131 |
|
131 |
| 47-80% free meals |
|
132 |
132 |
| 81-100% free meals |
137 |
|
137 |
| Total |
268 |
132 |
400 |
Percentage free meals in 3 categories * MEALCAT3 Crosstabulation
Count
|
MEALCAT3 |
Total |
| .00 |
1.00 |
| Percentage free meals in 3 categories |
0-46% free meals |
131 |
|
131 |
| 47-80% free meals |
132 |
|
132 |
| 81-100% free meals |
|
137 |
137 |
| Total |
263 |
137 |
400 |
What if we wanted a different group to be the reference group? For example, let's omit group 3.
regression
/dependent api00
/method = enter mealcat1 mealcat2.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
MEALCAT2, MEALCAT1(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.869(a) |
.755 |
.754 |
70.612 |
| a Predictors: (Constant), MEALCAT2, MEALCAT1
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000(a) |
| Residual |
1979474.328 |
397 |
4986.081 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), MEALCAT2, MEALCAT1 |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
504.380 |
6.033 |
|
83.606 |
.000 |
| MEALCAT1 |
301.338 |
8.629 |
.995 |
34.922 |
.000 |
| MEALCAT2 |
135.014 |
8.612 |
.447 |
15.677 |
.000 |
| a Dependent Variable: api 2000
|
With group 3 omitted, the constant is now the mean of group 3 and mealcat1
is group1-group3 and mealcat2 is group2-group3. We see that both of
these coefficients are significant, indicating that group 1 is significantly different from
group 3 and group 2 is significantly different from group 3.
3.3.3 Using the glm command
We can also do this analysis using the glm command. The benefit of
the glm command is that it we don't need to manually create dummy
varaibles, and it gives us the test of the overall effect of mealcat
without needing to subsequently use the /method = test statement as we did with the regress
command.
glm
api00 by mealcat.
Between-Subjects Factors
|
Value Label |
N |
| Percentage free meals in 3 categories |
1 |
0-46% free meals |
131 |
| 2 |
47-80% free meals |
132 |
| 3 |
81-100% free meals |
137 |
Tests of Between-Subjects Effects
Dependent Variable: api 2000
| Source |
Type III Sum of Squares |
df |
Mean Square |
F |
Sig. |
| Corrected Model |
6094197.670(a) |
2 |
3047098.835 |
611.121 |
.000 |
| Intercept |
168847142.059 |
1 |
168847142.059 |
33863.695 |
.000 |
| MEALCAT |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000 |
| Error |
1979474.328 |
397 |
4986.081 |
|
|
| Total |
175839633.000 |
400 |
|
|
|
| Corrected Total |
8073671.997 |
399 |
|
|
|
| a R Squared = .755 (Adjusted R Squared = .754)
|
We can use the /print=parameter statement with the glm
command to obtain the parameter estimates. Note that the estimates are
based on dummy coding with the last (third) category omitted, and correspond to
the results shown above where the third category was omitted.
glm
api00 by mealcat
/print=parameter.
Between-Subjects Factors
|
Value Label |
N |
| Percentage free meals in 3 categories |
1 |
0-46% free meals |
131 |
| 2 |
47-80% free meals |
132 |
| 3 |
81-100% free meals |
137 |
Tests of Between-Subjects Effects
Dependent Variable: api 2000
| Source |
Type III Sum of Squares |
df |
Mean Square |
F |
Sig. |
| Corrected Model |
6094197.670(a) |
2 |
3047098.835 |
611.121 |
.000 |
| Intercept |
168847142.059 |
1 |
168847142.059 |
33863.695 |
.000 |
| MEALCAT |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000 |
| Error |
1979474.328 |
397 |
4986.081 |
|
|
| Total |
175839633.000 |
400 |
|
|
|
| Corrected Total |
8073671.997 |
399 |
|
|
|
| a R Squared = .755 (Adjusted R Squared = .754)
|
Parameter Estimates
Dependent Variable: api 2000
|
B |
Std. Error |
t |
Sig. |
95% Confidence Interval |
| Parameter |
Lower Bound |
Upper Bound |
| Intercept |
504.380 |
6.033 |
83.606 |
.000 |
492.519 |
516.240 |
| [MEALCAT=1] |
301.338 |
8.629 |
34.922 |
.000 |
284.374 |
318.302 |
| [MEALCAT=2] |
135.014 |
8.612 |
15.677 |
.000 |
118.083 |
151.945 |
| [MEALCAT=3] |
0(a) |
. |
. |
. |
. |
. |
| a This parameter is set to zero because it is redundant.
|
Note that the parameter estimates are the same because mealcat is coded
the same way in the regress command and in the glm command, because in both cases the last category (category 3) is being dropped.
3.3.4 Other coding schemes
It is generally very convenient to use dummy coding, but that is not the only kind of
coding that can be used. As you have seen, when you use dummy coding one of the groups
becomes the reference group and all of the other groups are compared to that group. This
may not be the most interesting set of comparisons. Below is a list of the
types of coding schemes that SPSS will create for you. You can access
these through the pull-down menus, or you can request it on the /CONTRAST
statement when using GLM (described later). First, we show you how to
manually create the codes.
Deviation(refcat): The deviations from the grand mean.
Difference: The difference or reverse Helmert contrast - compare levels of a factor with the mean of the previous levels of the factor.
Simple(refcat): Compare each level of a factor to the last level.
Helmert: Compare levels of a factor with the mean of the subsequent levels of
the factor.
Polynomial: Orthogonal polynomial contrasts.
Repeated: Adjacent levels of a factor.
Special: A user-defined contrast.
Let's create a variable that compares group 1 with 2 and another variable that compares
group 2 with 3, and include those variables in the regression model. In
other words, we wish to create coefficients are comparisons of successive groups with group 1
as the baseline comparison group (i.e., the first comparison comparing group 1 vs. 2, and
the second comparison comparing groups 2 vs. 3). Below we show how to
manually generate
a coding scheme that forms these 2 comparisons.
if mealcat = 1 grp1 = .667.
if mealcat = 2 grp1 = -.333.
if mealcat = 3 grp1 = -.333.
if mealcat = 1 grp2 = .333.
if mealcat = 2 grp2 = .333.
if mealcat = 3 grp2 = -.667.
execute.
regression
/dep = api00
/method = enter grp1 grp2.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
GRP2, GRP1(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api
2000 |
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.869(a) |
.755 |
.754 |
70.612 |
| a Predictors: (Constant), GRP2, GRP1
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000(a) |
| Residual |
1979474.328 |
397 |
4986.081 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), GRP2, GRP1 |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
649.820 |
3.531 |
|
184.016 |
.000 |
| GRP1 |
166.324 |
8.708 |
.549 |
19.099 |
.000 |
| GRP2 |
135.014 |
8.612 |
.451 |
15.677 |
.000 |
| a Dependent Variable: api 2000
|
We can perform this same series of comparisions much easier using the glm command with the contrast statement.
glm
api00 by mealcat
/contrast (mealcat)=repeated
/print = parameter TEST(LMATRIX).
Between-Subjects Factors
|
Value Label |
N |
| Percentage free meals in 3 categories |
1 |
0-46% free meals |
131 |
| 2 |
47-80% free meals |
132 |
| 3 |
81-100% free meals |
137 |
Tests of Between-Subjects Effects
Dependent Variable: api 2000
| Source |
Type III Sum of Squares |
df |
Mean Square |
F |
Sig. |
| Corrected Model |
6094197.670(a) |
2 |
3047098.835 |
611.121 |
.000 |
| Intercept |
168847142.059 |
1 |
168847142.059 |
33863.695 |
.000 |
| MEALCAT |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000 |
| Error |
1979474.328 |
397 |
4986.081 |
|
|
| Total |
175839633.000 |
400 |
|
|
|
| Corrected Total |
8073671.997 |
399 |
|
|
|
| a R Squared = .755 (Adjusted R Squared = .754)
|
Parameter Estimates
Dependent Variable: api 2000
|
B |
Std. Error |
t |
Sig. |
95% Confidence Interval |
| Parameter |
Lower Bound |
Upper Bound |
| Intercept |
504.380 |
6.033 |
83.606 |
.000 |
492.519 |
516.240 |
| [MEALCAT=1] |
301.338 |
8.629 |
34.922 |
.000 |
284.374 |
318.302 |
| [MEALCAT=2] |
135.014 |
8.612 |
15.677 |
.000 |
118.083 |
151.945 |
| [MEALCAT=3] |
0(a) |
. |
. |
. |
. |
. |
| a This parameter is set to zero because it is redundant.
|
Intercept
|
Contrast |
| Parameter |
L1 |
| Intercept |
1.000 |
| [MEALCAT=1] |
.333 |
| [MEALCAT=2] |
.333 |
| [MEALCAT=3] |
.333 |
The default display of this matrix is the transpose of the corresponding L matrix. Based on Type III Sums of Squares.
|
MEALCAT
|
Contrast |
| Parameter |
L2 |
L3 |
| Intercept |
0 |
0 |
| [MEALCAT=1] |
1 |
0 |
| [MEALCAT=2] |
0 |
1 |
| [MEALCAT=3] |
-1 |
-1 |
The default display of this matrix is the transpose of the corresponding L matrix. Based on Type III Sums of Squares.
|
Contrast Coefficients (L' Matrix)
|
Percentage free meals in 3 categories Repeated Contrast |
| Parameter |
Level 1 vs. Level 2 |
Level 2 vs. Level 3 |
| Intercept |
0 |
0 |
| [MEALCAT=1] |
1 |
0 |
| [MEALCAT=2] |
-1 |
1 |
| [MEALCAT=3] |
0 |
-1 |
| The default display of this matrix is the transpose of the corresponding L matrix.
|
Contrast Results (K Matrix)
|
Dependent Variable |
| Percentage free meals in 3 categories Repeated Contrast |
api 2000 |
| Level 1 vs. Level 2 |
Contrast Estimate |
166.324 |
| Hypothesized Value |
0 |
| Difference (Estimate - Hypothesized) |
166.324 |
| Std. Error |
8.708 |
| Sig. |
.000 |
| 95% Confidence Interval for Difference |
Lower Bound |
149.203 |
| Upper Bound |
183.444 |
| Level 2 vs. Level 3 |
Contrast Estimate |
135.014 |
| Hypothesized Value |
0 |
| Difference (Estimate - Hypothesized) |
135.014 |
| Std. Error |
8.612 |
| Sig. |
.000 |
| 95% Confidence Interval for Difference |
Lower Bound |
118.083 |
| Upper Bound |
151.945 |
Test Results
Dependent Variable: api 2000
| Source |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| Contrast |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000 |
| Error |
1979474.328 |
397 |
4986.081 |
|
|
If you compare the parameter estimates with the means you can verify that B1
(i.e., 0-46% free meals) is the mean of group 1 minus group 2, and B2
(i.e., 47-80% free meals) is the mean of group 2 minus group 3. Both of these
comparisons are significant, indicating that group 1 significantly differs from group 2,
and group 2 significantly differs from group 3.
MEANS
TABLES=api00 BY mealcat.
Case Processing Summary
|
Cases |
| Included |
Excluded |
Total |
| N |
Percent |
N |
Percent |
N |
Percent |
| api 2000 * Percentage free meals in 3 categories |
400 |
100.0% |
0 |
.0% |
400 |
100.0% |
Report
api 2000
| Percentage free meals in 3 categories |
Mean |
N |
Std. Deviation |
| 0-46% free meals |
805.72 |
131 |
65.669 |
| 47-80% free meals |
639.39 |
132 |
82.135 |
| 81-100% free meals |
504.38 |
137 |
62.727 |
| Total |
647.62 |
400 |
142.249 |
3.4 Regression with two categorical predictors
Previously we looked at using yr_rnd to predict api00
And we have also looked at mealcat using the regression command
regression
/dep api00
/method = enter mealcat1 mealcat2.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
MEALCAT2, MEALCAT1(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.869(a) |
.755 |
.754 |
70.612 |
| a Predictors: (Constant), MEALCAT2, MEALCAT1
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000(a) |
| Residual |
1979474.328 |
397 |
4986.081 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), MEALCAT2, MEALCAT1 |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
504.380 |
6.033 |
|
83.606 |
.000 |
| MEALCAT1 |
301.338 |
8.629 |
.995 |
34.922 |
.000 |
| MEALCAT2 |
135.014 |
8.612 |
.447 |
15.677 |
.000 |
| a Dependent Variable: api 2000
|
We can include both yr_rnd and mealcat together in the same model.
regression
/dep api00
/method = enter yr_rnd mealcat1 mealcat2.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
MEALCAT2, year round school, MEALCAT1(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.876(a) |
.767 |
.765 |
68.893 |
| a Predictors: (Constant), MEALCAT2, year round school, MEALCAT1
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
6194144.303 |
3 |
2064714.768 |
435.017 |
.000(a) |
| Residual |
1879527.694 |
396 |
4746.282 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), MEALCAT2, year round school, MEALCAT1 |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
526.330 |
7.585 |
|
69.395 |
.000 |
| year round school |
-42.960 |
9.362 |
-.127 |
-4.589 |
.000 |
| MEALCAT1 |
281.683 |
9.446 |
.930 |
29.821 |
.000 |
| MEALCAT2 |
117.946 |
9.189 |
.390 |
12.836 |
.000 |
| a Dependent Variable: api 2000
|
We can test the overall effect of mealcat with the method=test() command, which
is significant.
regression
/dep api00
/method = enter yr_rnd
/method = test(mealcat1 mealcat2).
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
year round school(a) |
. |
Enter |
| 2 |
MEALCAT2, MEALCAT1 |
. |
Test |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.475(a) |
.226 |
.224 |
125.300 |
| 2 |
.876(b) |
.767 |
.765 |
68.893 |
| a Predictors: (Constant), year round school |
| b Predictors: (Constant), year round school, MEALCAT2, MEALCAT1
|
ANOVA(d)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
R Square Change |
| 1 |
Regression |
1825000.563 |
1 |
1825000.563 |
116.241 |
.000(a) |
|
| Residual |
6248671.435 |
398 |
15700.179 |
|
|
|
| Total |
8073671.997 |
399 |
|
|
|
|
| 2 |
Subset Tests |
MEALCAT1, MEALCAT2 |
4369143.740 |
2 |
2184571.870 |
460.270 |
.000(b) |
.541 |
| Regression |
6194144.303 |
3 |
2064714.768 |
435.017 |
.000(c) |
|
| Residual |
1879527.694 |
396 |
4746.282 |
|
|
|
| Total |
8073671.997 |
399 |
|
|
|
|
| a Predictors: (Constant), year round school |
| b Tested against the full model. |
| c Predictors in the Full Model: (Constant), year round school, MEALCAT2, MEALCAT1. |
| d Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
684.539 |
7.140 |
|
95.878 |
.000 |
| year round school |
-160.506 |
14.887 |
-.475 |
-10.782 |
.000 |
| 2 |
(Constant) |
526.330 |
7.585 |
|
69.395 |
.000 |
| year round school |
-42.960 |
9.362 |
-.127 |
-4.589 |
.000 |
| MEALCAT1 |
281.683 |
9.446 |
.930 |
29.821 |
.000 |
| MEALCAT2 |
117.946 |
9.189 |
.390 |
12.836 |
.000 |
| a Dependent Variable: api 2000
|
Excluded Variables(b)
|
Beta In |
t |
Sig. |
Partial Correlation |
Collinearity Statistics |
| Model |
Tolerance |
| 1 |
MEALCAT1 |
.697(a) |
23.132 |
.000 |
.758 |
.914 |
| MEALCAT2 |
-.138(a) |
-3.106 |
.002 |
-.154 |
.962 |
| a Predictors in the Model: (Constant), year round school |
| b Dependent Variable: api 2000
|
Because this model has only main effects (no interactions) you can interpret Byr_rnd
as the difference between the year round and non-year round group. The
coefficient for mealcat1 (which we will call Bmealcat1) is the difference between mealcat=1 and mealcat=3, and Bmealcat2 as
the difference between mealcat=2 and mealcat=3.
Let's dig below the surface and see how the coefficients relate to the predicted
values. Let's view the cells formed by crossing yr_rnd and mealcat
and number the cells from cell1 to cell6.
mealcat=1 mealcat=2 mealcat=3
yr_rnd=0 cell1 cell2 cell3
yr_rnd=1 cell4 cell5 cell6
With respect to mealcat, the group mealcat=3 is the
reference category, and with respect to yr_rnd the group yr_rnd=0
is the reference category. As a result, cell3 is the reference cell. The constant is the
predicted value for this cell.
The coefficient for yr_rnd is the difference between cell3
and cell6. Since this model has only main effects, it is also the
difference between cell2 and cell5, or from cell1 and cell4. In other words, Byr_rnd
is the amount you add to the predicted value when you go from non-year round to year round
schools.
The coefficient for _Imealcat_1 is the predicted difference between
cell1 and cell3. Since this model only has main effects, it is also the predicted
difference between cell4 and cell6. Likewise, B_Imealcat_2 is the
predicted difference between cell2 and cell3, and also the predicted difference between
cell5 and cell6.
So, the predicted values, in terms of the coefficients, would be
mealcat=1 mealcat=2 mealcat=3
-----------------------------------------------
yr_rnd=0 intercept intercept intercept
+BMealCat1 +BMealCat2
-----------------------------------------------
yr_rnd=1 intercept intercept intercept
+Byr_rnd +Byr_rnd +Byr_rnd
+BMealCat1 +BMealCat2
We should note that if you computed the predicted values for each cell, they would not
exactly match the means in the 6 cells. The predicted means would be close to the
observed means in the cells, but not exactly the same. This is because our model
only has main effects and assumes that the difference between cell1 and cell4 is exactly
the same as the difference between cells 2 and 5 which is the same as the difference
between cells 3 and 5. Since the observed values don't follow this pattern, there is
some discrepancy between the predicted means and observed means.
3.4.2 Using the glm command
We can run the same analysis using the glm command with just main
effects. Because SPSS's default is to include all main effects and
interactions in the model, to get just the main effects, you need to include the
/design statement and specify just the main effects, as shown
below.
glm
api00 BY yr_rnd mealcat
/DESIGN = yr_rnd mealcat
/print=parameter TEST(LMATRIX).
Between-Subjects Factors
|
Value Label |
N |
| year round school |
0 |
No |
308 |
| 1 |
Yes |
92 |
| Percentage free meals in 3 categories |
1 |
0-46% free meals |
131 |
| 2 |
47-80% free meals |
132 |
| 3 |
81-100% free meals |
137 |
Tests of Between-Subjects Effects
Dependent Variable: api 2000
| Source |
Type III Sum of Squares |
df |
Mean Square |
F |
Sig. |
| Corrected Model |
6194144.303(a) |
3 |
2064714.768 |
435.017 |
.000 |
| Intercept |
104733334.071 |
1 |
104733334.071 |
22066.395 |
.000 |
| YR_RND |
99946.633 |
1 |
99946.633 |
21.058 |
.000 |
| MEALCAT |
4369143.740 |
2 |
2184571.870 |
460.270 |
.000 |
| Error |
1879527.694 |
396 |
4746.282 |
|
|
| Total |
175839633.000 |
400 |
|
|
|
| Corrected Total |
8073671.997 |
399 |
|
|
|
| a R Squared = .767 (Adjusted R Squared = .765)
|
Parameter Estimates
Dependent Variable: api 2000
|
B |
Std. Error |
t |
Sig. |
95% Confidence Interval |
| Parameter |
Lower Bound |
Upper Bound |
| Intercept |
483.370 |
7.457 |
64.821 |
.000 |
468.710 |
498.030 |
| [YR_RND=0] |
42.960 |
9.362 |
4.589 |
.000 |
24.555 |
61.365 |
| [YR_RND=1] |
0(a) |
. |
. |
. |
. |
. |
| [MEALCAT=1] |
281.683 |
9.446 |
29.821 |
.000 |
263.113 |
300.253 |
| [MEALCAT=2] |
117.946 |
9.189 |
12.836 |
.000 |
99.881 |
136.011 |
| [MEALCAT=3] |
0(a) |
. |
. |
. |
. |
. |
| a This parameter is set to zero because it is redundant.
|
Intercept
|
Contrast |
| Parameter |
L1 |
| Intercept |
1.000 |
| [YR_RND=0] |
.500 |
| [YR_RND=1] |
.500 |
| [MEALCAT=1] |
.333 |
| [MEALCAT=2] |
.333 |
| [MEALCAT=3] |
.333 |
The default display of this matrix is the transpose of the corresponding L matrix. Based on Type III Sums of Squares.
|
YR_RND
|
Contrast |
| Parameter |
L2 |
| Intercept |
0 |
| [YR_RND=0] |
1 |
| [YR_RND=1] |
-1 |
| [MEALCAT=1] |
0 |
| [MEALCAT=2] |
0 |
| [MEALCAT=3] |
0 |
The default display of this matrix is the transpose of the corresponding L matrix. Based on Type III Sums of Squares.
|
MEALCAT
|
Contrast |
| Parameter |
L4 |
L5 |
| Intercept |
0 |
0 |
| [YR_RND=0] |
0 |
0 |
| [YR_RND=1] |
0 |
0 |
| [MEALCAT=1] |
1 |
0 |
| [MEALCAT=2] |
0 |
1 |
| [MEALCAT=3] |
-1 |
-1 |
The default display of this matrix is the transpose of the corresponding L matrix. Based on Type III Sums of Squares.
|
In summary, these results indicate the differences between year round and non-year
round students is significant, and the differences among the three mealcat
groups are significant.
3.5 Categorical predictor with interactions
3.5.1 Manually coding an interaction
Let's perform the same analysis that we performed above. This time let's include the
interaction of mealcat by yr_rnd.
We can test the overall interaction with the test command. This interaction effect is not
significant, with an F of 1.116 and a p value of .329.
regression
/dep api00
/method = enter yr_rnd mealcat1 mealcat2
/method = test(yrmeal1 yrmeal2).
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
MEALCAT2, year round school, MEALCAT1(a) |
. |
Enter |
| 2 |
YRMEAL1, YRMEAL2 |
. |
Test |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.876(a) |
.767 |
.765 |
68.893 |
| 2 |
.877(b) |
.769 |
.766 |
68.873 |
| a Predictors: (Constant), MEALCAT2, year round school, MEALCAT1 |
| b Predictors: (Constant), MEALCAT2, year round school, MEALCAT1, YRMEAL1, YRMEAL2
|
ANOVA(d)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
R Square Change |
| 1 |
Regression |
6194144.303 |
3 |
2064714.768 |
435.017 |
.000(a) |
|
| Residual |
1879527.694 |
396 |
4746.282 |
|
|
|
| Total |
8073671.997 |
399 |
|
|
|
|
| 2 |
Subset Tests |
YRMEAL1, YRMEAL2 |
10583.519 |
2 |
5291.759 |
1.116 |
.329(b) |
.001 |
| Regression |
6204727.822 |
5 |
1240945.564 |
261.609 |
.000(c) |
|
| Residual |
1868944.176 |
394 |
4743.513 |
|
|
|
| Total |
8073671.997 |
399 |
|
|
|
|
| a Predictors: (Constant), MEALCAT2, year round school, MEALCAT1 |
| b Tested against the full model. |
| c Predictors in the Full Model: (Constant), MEALCAT2, year round school, MEALCAT1, YRMEAL1, YRMEAL2. |
| d Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
526.330 |
7.585 |
|
69.395 |
.000 |
| year round school |
-42.960 |
9.362 |
-.127 |
-4.589 |
.000 |
| MEALCAT1 |
281.683 |
9.446 |
.930 |
29.821 |
.000 |
| MEALCAT2 |
117.946 |
9.189 |
.390 |
12.836 |
.000 |
| 2 |
(Constant) |
521.493 |
8.414 |
|
61.978 |
.000 |
| year round school |
-33.493 |
11.771 |
-.099 |
-2.845 |
.005 |
| MEALCAT1 |
288.193 |
10.443 |
.952 |
27.597 |
.000 |
| MEALCAT2 |
123.781 |
10.552 |
.410 |
11.731 |
.000 |
| YRMEAL1 |
-40.764 |
29.231 |
-.038 |
-1.395 |
.164 |
| YRMEAL2 |
-18.248 |
22.256 |
-.024 |
-.820 |
.413 |
| a Dependent Variable: api 2000
|
Excluded Variables(b)
|
Beta In |
t |
Sig. |
Partial Correlation |
Collinearity Statistics |
| Model |
Tolerance |
| 1 |
YRMEAL1 |
-.033(a) |
-1.249 |
.212 |
-.063 |
.846 |
| YRMEAL2 |
-.016(a) |
-.535 |
.593 |
-.027 |
.695 |
| a Predictors in the Model: (Constant), MEALCAT2, year round school, MEALCAT1 |
| b Dependent Variable: api 2000
|
It is important to note how the meaning of the coefficients change in the presence of
these interaction terms. For example, in the prior model, with only main effects, we could
interpret Byr_rnd as the difference between the year-round and non- year-round students. However, now that we have added the interaction term, the term Byr_rnd
represents the difference between cell3 and cell6, or the difference between the year-
round and non-year round students when mealcat=3 (because mealcat=3 was the omitted
group). The presence of an interaction would imply that the difference between year round
and non-year-round students depends on the level of mealcat.
The interaction terms Byrmeal1
and Byrmeal2 represent the extent to which the difference between
the year-round/non- year- round students changes when mealcat=1 and when mealcat=2 (as
compared to the reference group, mealcat=3). For example, the term Byrmeal1
represents the difference between year- round and non-year- round for mealcat=1 vs. the
difference for mealcat=3. In other words, Byrmeal1 in this design is (cell1-cell4) -
(cell3-cell6), or it represents how much the effect of yr_rnd differs
between mealcat=1 and mealcat=3.
Below we have shown the predicted values for the six cells in terms of the coefficients
in the model. If you compare this to the main effects model, you will see that the
predicted values are the same except for the addition of yrmeal1
(in cell 4) and yrmeal2 (in cell 5).
mealcat=1 mealcat=2 mealcat=3
-------------------------------------------------
yr_rnd=0 intercept intercept intercept
+BMealCat1 +BMealCat2
-------------------------------------------------
yr_rnd=1 intercept intercept intercept
+Byr_rnd +Byr_rnd +Byr_rnd
+BMealCat1 +BMealCat2
+YrMeal1 +YrMeal2
It can be very tricky to interpret these interaction terms if you wish to form specific
comparisons. For example, if you wanted to perform a test of the simple main effect of yr_rnd
when mealcat=1, i.e., comparing compare cell1 with cell4, you would want to compare
intercept+ BMealCat1 vs. intercept + Byr_rnd + BMealCat1 +
BYrMeal1. As we will see, such tests can be more easily done via glm.
3.5.2 Using glm
Constructing these interactions can be somewhat easier when using the glm
command. As you see below, the glm command gives us the test of the overall
main effects and interactions without the need to perform subsequent test
statement. The /emmeans statement tells SPSS to compare each level
of the categorical variable with every other level of that variable.
glm
api00 by yr_rnd mealcat
/EMMEANS TABLES(yr_rnd*mealcat) COMPARE(yr_rnd).
Between-Subjects Factors
|
Value Label |
N |
| year round school |
0 |
No |
308 |
| 1 |
Yes |
92 |
| Percentage free meals in 3 categories |
1 |
0-46% free meals |
131 |
| 2 |
47-80% free meals |
132 |
| 3 |
81-100% free meals |
137 |
Tests of Between-Subjects Effects
Dependent Variable: api 2000
| Source |
Type III Sum of Squares |
df |
Mean Square |
F |
Sig. |
| Corrected Model |
6204727.822(a) |
5 |
1240945.564 |
261.609 |
.000 |
| Intercept |
56354756.653 |
1 |
56354756.653 |
11880.384 |
.000 |
| YR_RND |
99617.371 |
1 |
99617.371 |
21.001 |
.000 |
| MEALCAT |
1796232.798 |
2 |
898116.399 |
189.336 |
.000 |
| YR_RND * MEALCAT |
10583.519 |
2 |
5291.759 |
1.116 |
.329 |
| Error |
1868944.176 |
394 |
4743.513 |
|
|
| Total |
175839633.000 |
400 |
|
|
|
| Corrected Total |
8073671.997 |
399 |
|
|
|
| a R Squared = .769 (Adjusted R Squared = .766)
|
Estimates
Dependent Variable: api 2000
|
Mean |
Std. Error |
95% Confidence Interval |
| year round school |
Percentage free meals in 3 categories |
Lower Bound |
Upper Bound |
| No |
0-46% free meals |
809.685 |
6.185 |
797.526 |
821.845 |
| 47-80% free meals |
645.274 |
6.367 |
632.755 |
657.792 |
| 81-100% free meals |
521.493 |
8.414 |
504.950 |
538.035 |
| Yes |
0-46% free meals |
735.429 |
26.032 |
684.250 |
786.607 |
| 47-80% free meals |
593.533 |
17.783 |
558.572 |
628.495 |
| 81-100% free meals |
488.000 |
8.232 |
471.816 |
504.184 |
Pairwise Comparisons
Dependent Variable: api 2000
|
Mean Difference (I-J) |
Std. Error |
Sig.(a) |
95% Confidence Interval for Difference(a) |
| Percentage free meals in 3 categories |
(I) year round school |
(J) year round school |
Lower Bound |
Upper Bound |
| 0-46% free meals |
No |
Yes |
74.257(*) |
26.756 |
.006 |
21.654 |
126.860 |
| Yes |
No |
-74.257(*) |
26.756 |
.006 |
-126.860 |
-21.654 |
| 47-80% free meals |
No |
Yes |
51.740(*) |
18.889 |
.006 |
14.605 |
88.875 |
| Yes |
No |
-51.740(*) |
18.889 |
.006 |
-88.875 |
-14.605 |
| 81-100% free meals |
No |
Yes |
33.493(*) |
11.771 |
.005 |
10.350 |
56.635 |
| Yes |
No |
-33.493(*) |
11.771 |
.005 |
-56.635 |
-10.350 |
| Based on estimated marginal means |
| * The mean difference is significant at the .050 level. |
| a Adjustment for multiple comparisons: Least Significant Difference (equivalent to no adjustments).
|
Univariate Tests
Dependent Variable: api 2000
| Percentage free meals in 3 categories |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 0-46% free meals |
Contrast |
36536.101 |
1 |
36536.101 |
7.702 |
.006 |
| Error |
1868944.176 |
394 |
4743.513 |
|
|
| 47-80% free meals |
Contrast |
35592.534 |
1 |
35592.534 |
7.503 |
.006 |
| Error |
1868944.176 |
394 |
4743.513 |
|
|
| 81-100% free meals |
Contrast |
38401.517 |
1 |
38401.517 |
8.096 |
.005 |
| Error |
1868944.176 |
394 |
4743.513 |
|
|
| Each F tests the simple effects of year round school within each level combination of the other effects shown. These tests are based on the linearly independent pairwise comparisons among the estimated marginal means.
|
Although this section has focused on how to handle analyses involving interactions,
these particular results show no indication of interaction. We could decide to omit
interaction terms from future analyses having found the interactions to be
non-significant. This would simplify future analyses, however including the interaction
term can be useful to assure readers that the interaction term is non-significant.
3.6 Continuous and Categorical variables
3.6.1 Using regress
Say that we wish to analyze both continuous and categorical variables in one analysis.
For example, let's include yr_rnd and some_col in the
same analysis. We will save the predicted values for use in just a moment.
regress
/dep = api00
/method = enter yr_rnd some_col
/save pre.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
parent some college, year round
school(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary(b)
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.507(a) |
.257 |
.253 |
122.951 |
| a Predictors: (Constant), parent some college, year round school |
| b Dependent Variable: api 2000
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
2072201.839 |
2 |
1036100.919 |
68.539 |
.000(a) |
| Residual |
6001470.159 |
397 |
15117.053 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), parent some college, year round school |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
637.858 |
13.503 |
|
47.237 |
.000 |
| year round school |
-149.159 |
14.875 |
-.442 |
-10.027 |
.000 |
| parent some college |
2.236 |
.553 |
.178 |
4.044 |
.000 |
| a Dependent Variable: api 2000
|
Residuals Statistics(a)
|
Minimum |
Maximum |
Mean |
Std. Deviation |
N |
| Predicted Value |
488.70 |
787.65 |
647.62 |
72.066 |
400 |
| Residual |
-276.04 |
293.20 |
.00 |
122.643 |
400 |
| Std. Predicted Value |
-2.205 |
1.943 |
.000 |
1.000 |
400 |
| Std. Residual |
-2.245 |
2.385 |
.000 |
.997 |
400 |
| a Dependent Variable: api 2000
|
Let's graph the predicted values by some_col.
GRAPH
/SCATTERPLOT(BIVAR)=some_col WITH pre_1.
The coefficient for some_col indicates that for every unit increase in some_col
the api00 score is predicted to increase by 2.23 units. This is the slope
of the lines shown in the above graph. The graph has two lines, one for the year round
students and one for the non-year round students. The coefficient for yr_rnd
is -149.16, indicating that as yr_rnd increases by 1 unit, the api00
score is expected to decrease by about 149 units. As you can see in the graph, the top
line is about 150 units higher than the lower line. You can see that the intercept is 637
and that is where the upper line crosses the Y axis when X is 0. The lower line crosses
the line about 150 units lower at about 487.
3.6.2 Using glm
We can run this analysis using the glm command. The glm command
assumes that the variables are categorical; thus, we need to enter some_col
as a covariate to specify that some_col is a
continuous variable.
If we square the t-values from the regress command (above), we would find that they
match those of the glm command.
3.7 Interactions of Continuous by 0/1 Categorical variables
Above we showed an analysis that looked at the relationship between some_col and
api00 and also included yr_rnd. We saw that this produced a graph
where we saw the relationship between some_col and api00 but there were two
regression lines, one higher than the other but with equal slopes. Such a model
assumed that the slope was the same for the two groups.
Perhaps the slope might be different for these groups. Let's run
the regressions separately for these two groups beginning with the non-year-round schools.
COMPUTE filt=(yr_rnd=0).
FILTER BY filt.
regress
/dep = api00
/method = enter some_col.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
parent some college(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.126(a) |
.016 |
.013 |
131.278 |
| a Predictors: (Constant), parent some college
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
84700.858 |
1 |
84700.858 |
4.915 |
.027(a) |
| Residual |
5273591.675 |
306 |
17233.960 |
|
|
| Total |
5358292.532 |
307 |
|
|
|
| a Predictors: (Constant), parent some college |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
655.110 |
15.237 |
|
42.995 |
.000 |
| parent some college |
1.409 |
.636 |
.126 |
2.217 |
.027 |
| a Dependent Variable: api 2000
|
IGRAPH
/X1 = VAR(some_col)
/Y = VAR (api00)
/FITLINE METHOD = REGRESSION LINEAR LINE = TOTAL MEFFECT
/SCATTER COINCIDENT = NONE.
Likewise, let's look at the year-round schools.
COMPUTE filt=(yr_rnd=1).
FILTER BY filt.
regress
/dep = api00
/method = enter some_col.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
parent some college(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.648(a) |
.420 |
.413 |
75.773 |
| a Predictors: (Constant), parent some college
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
373644.064 |
1 |
373644.064 |
65.078 |
.000(a) |
| Residual |
516734.838 |
90 |
5741.498 |
|
|
| Total |
890378.902 |
91 |
|
|
|
| a Predictors: (Constant), parent some college |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
407.039 |
16.515 |
|
24.647 |
.000 |
| parent some college |
7.403 |
.918 |
.648 |
8.067 |
.000 |
| a Dependent Variable: api 2000
|
IGRAPH
/X1 = VAR(some_col)
/Y = VAR (api00)
/FITLINE METHOD = REGRESSION LINEAR LINE = TOTAL MEFFECT
/SCATTER COINCIDENT = NONE.

Note that the slope of the regression line looks much steeper for the year-round
schools than for the non-year-round schools. This is confirmed by the regression equations
that show the slope for the year round schools to be higher (7.4) than non-year round
schools (1.3). We can compare these to see if these are significantly different from each
other by including the interaction of some_col by yr_rnd, an interaction of a
continuous variable by a categorical variable.
3.7.1 Computing interactions manually
We will start by manually computing the interaction of some_col by yr_rnd.
Let's start fresh and reload the elemapi2 data file to clear out any variables we
had previously created.
GET FILE='C:\spssreg\elemapi2.sav'.
Next, let's make a variable that is the interaction of some college (some_col)
and year-round schools (yr_rnd) called yrXsome.
compute yrXsome = yr_rnd*some_col.
execute.
We can now run the regression that tests whether the coefficient for some_col is
significantly different for year round schools and non-year- round schools. Indeed, the
yrXsome interaction effect is significant. We can make a graph showing the regression lines for the two types of schools showing
how different their regression lines are, so we will save the predicted
values.
regress
/dep = api00
/method = enter some_col yr_rnd yrXsome
/save pre.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
YRXSOME, parent some college, year round school(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary(b)
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.532(a) |
.283 |
.277 |
120.922 |
| a Predictors: (Constant), YRXSOME, parent some college, year round school |
| b Dependent Variable: api 2000
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
2283345.485 |
3 |
761115.162 |
52.053 |
.000(a) |
| Residual |
5790326.513 |
396 |
14622.037 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), YRXSOME, parent some college, year round school |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
655.110 |
14.035 |
|
46.677 |
.000 |
| parent some college |
1.409 |
.586 |
.112 |
2.407 |
.017 |
| year round school |
-248.071 |
29.859 |
-.735 |
-8.308 |
.000 |
| YRXSOME |
5.993 |
1.577 |
.330 |
3.800 |
.000 |
| a Dependent Variable: api 2000
|
Residuals Statistics(a)
|
Minimum |
Maximum |
Mean |
Std. Deviation |
N |
| Predicted Value |
407.04 |
749.54 |
647.62 |
75.648 |
400 |
| Residual |
-275.12 |
279.25 |
.00 |
120.466 |
400 |
| Std. Predicted Value |
-3.180 |
1.347 |
.000 |
1.000 |
400 |
| Std. Residual |
-2.275 |
2.309 |
.000 |
.996 |
400 |
| a Dependent Variable: api 2000
|
We can graph the predicted values for the two types of schools by some_col.
You can see how the two lines have quite different slopes, consistent with the fact that
the yrXsome interaction was significant.
GRAPH
/SCATTERPLOT(BIVAR)=some_col WITH pre_1 BY yr_rnd.

We can replot the same graph including the data points. You will need
to double-click on the graph that is produced by the code below to add the
regression lines to the graph.
GRAPH
/SCATTERPLOT(BIVAR)=some_col WITH api00 BY yr_rnd.

Let's quickly run the regressions again where we performed separate regressions for the
two groups.
Non-year-round
COMPUTE filt=(yr_rnd=0).
FILTER BY filt.
regress
/dep = api00
/method = enter some_col.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
parent some college(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.126(a) |
.016 |
.013 |
131.278 |
| a Predictors: (Constant), parent some college
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
84700.858 |
1 |
84700.858 |
4.915 |
.027(a) |
| Residual |
5273591.675 |
306 |
17233.960 |
|
|
| Total |
5358292.532 |
307 |
|
|
|
| a Predictors: (Constant), parent some college |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
655.110 |
15.237 |
|
42.995 |
.000 |
| parent some college |
1.409 |
.636 |
.126 |
2.217 |
.027 |
| a Dependent Variable: api 2000
|
Year-round
COMPUTE filt=(yr_rnd=1).
FILTER BY filt.
regress
/dep = api00
/method = enter some_col.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
parent some college(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.648(a) |
.420 |
.413 |
75.773 |
| a Predictors: (Constant), parent some college
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
373644.064 |
1 |
373644.064 |
65.078 |
.000(a) |
| Residual |
516734.838 |
90 |
5741.498 |
|
|
| Total |
890378.902 |
91 |
|
|
|
| a Predictors: (Constant), parent some college |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
407.039 |
16.515 |
|
24.647 |
.000 |
| parent some college |
7.403 |
.918 |
.648 |
8.067 |
.000 |
| a Dependent Variable: api 2000
|
Now, let's show the regression for both types of schools with the interaction
term.
filter off.
regress
/dep = api00
/method = enter some_col yr_rnd yrXsome
/save pre.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
YRXSOME, parent some college, year round school(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary(b)
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.532(a) |
.283 |
.277 |
120.922 |
| a Predictors: (Constant), YRXSOME, parent some college, year round school |
| b Dependent Variable: api 2000
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
2283345.485 |
3 |
761115.162 |
52.053 |
.000(a) |
| Residual |
5790326.513 |
396 |
14622.037 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), YRXSOME, parent some college, year round school |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
655.110 |
14.035 |
|
46.677 |
.000 |
| parent some college |
1.409 |
.586 |
.112 |
2.407 |
.017 |
| year round school |
-248.071 |
29.859 |
-.735 |
-8.308 |
.000 |
| YRXSOME |
5.993 |
1.577 |
.330 |
3.800 |
.000 |
| a Dependent Variable: api 2000
|
Residuals Statistics(a)
|
Minimum |
Maximum |
Mean |
Std. Deviation |
N |
| Predicted Value |
407.04 |
749.54 |
647.62 |
75.648 |
400 |
| Residual |
-275.12 |
279.25 |
.00 |
120.466 |
400 |
| Std. Predicted Value |
-3.180 |
1.347 |
.000 |
1.000 |
400 |
| Std. Residual |
-2.275 |
2.309 |
.000 |
.996 |
400 |
| a Dependent Variable: api 2000
|
Note that the coefficient for some_col in the combined analysis is the same as the coefficient for some_col
for the non-year-round schools. This is because non-year-round schools are the reference
group. Then, the coefficient for the yrXsome interaction in the
combined analysis is the Bsome_col
for the year round schools (7.4) minus Bsome_col for the non year round schools
(1.41), yielding 5.99. This interaction is the difference in the slopes of some_col
for the two types of schools, and this is why this is useful for testing whether the
regression lines for the two types of schools are equal. If the two types of schools had
the same regression coefficient for some_col, then the coefficient for
the yrXsome
interaction would be 0. In this case, the difference is significant, indicating that the
regression lines are significantly different.
So, if we look at the graph of the two regression lines we can see the
difference in the slopes of the regression lines (see graph below).
Indeed, we can see that the non-year round schools (the solid line) have a
smaller slope (1.4) than the slope for the year
round schools (7.4). The difference between these slopes is 5.99, the
coefficient for yrXsome.
GRAPH
/SCATTERPLOT(BIVAR)=some_col WITH pre_1 BY yr_rnd.

3.7.2 Computing interactions with glm
We can also run a model just like the model we showed above using the glm
command.
We include the terms yr_rnd some_col and the interaction yr_rnr*some_col
.
glm
api00 BY yr_rnd WITH some_col
/DESIGN = some_col yr_rnd yr_rnd*some_col.
Between-Subjects Factors
|
Value Label |
N |
| year round school |
0 |
No |
308 |
| 1 |
Yes |
92 |
Tests of Between-Subjects Effects
Dependent Variable: api 2000
| Source |
Type III Sum of Squares |
df |
Mean Square |
F |
Sig. |
| Corrected Model |
2283345.485(a) |
3 |
761115.162 |
52.053 |
.000 |
| Intercept |
18502483.537 |
1 |
18502483.537 |
1265.383 |
.000 |
| SOME_COL |
456473.187 |
1 |
456473.187 |
31.218 |
.000 |
| YR_RND |
1009279.986 |
1 |
1009279.986 |
69.025 |
.000 |
| YR_RND * SOME_COL |
211143.646 |
1 |
211143.646 |
14.440 |
.000 |
| Error |
5790326.513 |
396 |
14622.037 |
|
|
| Total |
175839633.000 |
400 |
|
|
|
| Corrected Total |
8073671.997 |
399 |
|
|
|
| a R Squared = .283 (Adjusted R Squared = .277)
|
As we illustrated above, we can compute the predicted values using the predict command
and graph the separate regression lines. These commands are omitted.
In this section we found that the relationship between some_col and api00
depended on whether the student was from a year-round school or from a non-year-round
school. For the students from year- round schools, the relationship between some_col
and api00 was significantly stronger than for those from non-year- round
schools. In general, this type of analysis allows you to test whether the strength
of the relationship between two continuous variables varies based on the categorical
variable.
3.8 Continuous and Categorical variables, interaction with 1/2/3 variable
The prior examples showed how to do regressions with a continuous variable and a
categorical variable that has 2 levels. These examples will extend this further by
using a categorical variable with 3 levels, mealcat.
3.8.1 using regress
We can run a model with some_col mealcat and
the interaction of these two variables.
GET FILE='C:\spssreg\elemapi2.sav'.
if mealcat ~= missing(mealcat) mealcat1 = 0.
if mealcat = 1 mealcat1 = 1.
if mealcat ~= missing(mealcat) mealcat2 = 0.
if mealcat = 2 mealcat2 = 1.
if mealcat ~= missing(mealcat) mealcat3 = 0.
if mealcat = 3 mealcat3 = 1.
compute smc1 = mealcat1*some_col.
compute smc2 = mealcat2*some_col.
compute smc3 = mealcat3*some_col.
execute.
regress
/dep = api00
/method = enter mealcat2 mealcat3 some_col
/method = test (smc2 smc3)
/save pre.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
parent some college, MEALCAT2, MEALCAT3(a) |
. |
Enter |
| 2 |
SMC3, SMC2 |
. |
Test |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary(c)
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.870(a) |
.757 |
.756 |
70.332 |
| 2 |
.877(b) |
.769 |
.767 |
68.733 |
| a Predictors: (Constant), parent some college, MEALCAT2, MEALCAT3 |
| b Predictors: (Constant), parent some college, MEALCAT2, MEALCAT3, SMC3, SMC2 |
| c Dependent Variable: api 2000
|
ANOVA(d)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
R Square Change |
| 1 |
Regression |
6114838.708 |
3 |
2038279.569 |
412.061 |
.000(a) |
|
| Residual |
1958833.290 |
396 |
4946.549 |
|
|
|
| Total |
8073671.997 |
399 |
|
|
|
|
| 2 |
Subset Tests |
SMC2, SMC3 |
97468.169 |
2 |
48734.084 |
10.316 |
.000(b) |
.012 |
| Regression |
6212306.876 |
5 |
1242461.375 |
262.995 |
.000(c) |
|
| Residual |
1861365.121 |
394 |
4724.277 |
|
|
|
| Total |
8073671.997 |
399 |
|
|
|
|
| a Predictors: (Constant), parent some college, MEALCAT2, MEALCAT3 |
| b Tested against the full model. |
| c Predictors in the Full Model: (Constant), parent some college, MEALCAT2, MEALCAT3, SMC3, SMC2. |
| d Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
791.179 |
9.403 |
|
84.143 |
.000 |
| MEALCAT2 |
-168.132 |
8.719 |
-.556 |
-19.284 |
.000 |
| MEALCAT3 |
-296.436 |
8.923 |
-.990 |
-33.221 |
.000 |
| parent some college |
.683 |
.334 |
.054 |
2.043 |
.042 |
| 2 |
(Constant) |
825.894 |
11.992 |
|
68.871 |
.000 |
| MEALCAT2 |
-239.030 |
18.665 |
-.791 |
-12.806 |
.000 |
| MEALCAT3 |
-344.948 |
17.057 |
-1.152 |
-20.223 |
.000 |
| parent some college |
-.947 |
.487 |
-.076 |
-1.944 |
.053 |
| SMC2 |
3.141 |
.729 |
.286 |
4.307 |
.000 |
| SMC3 |
2.607 |
.896 |
.149 |
2.910 |
.004 |
| a Dependent Variable: api 2000
|
Excluded Variables(b)
|
Beta In |
t |
Sig. |
Partial Correlation |
Collinearity Statistics |
| Model |
Tolerance |
| 1 |
SMC2 |
.215(a) |
3.455 |
.001 |
.171 |
.153 |
| SMC3 |
.069(a) |
1.412 |
.159 |
.071 |
.258 |
| a Predictors in the Model: (Constant), parent some college, MEALCAT2, MEALCAT3 |
| b Dependent Variable: api 2000
|
Casewise Diagnostics(a)
| Case Number |
Std. Residual |
api 2000 |
| 226 |
-3.593 |
386 |
| a Dependent Variable: api 2000
|
Residuals Statistics(a)
|
Minimum |
Maximum |
Mean |
Std. Deviation |
N |
| Predicted Value |
480.95 |
825.89 |
647.62 |
124.779 |
400 |
| Residual |
-246.93 |
201.23 |
.00 |
68.301 |
400 |
| Std. Predicted Value |
-1.336 |
1.429 |
.000 |
1.000 |
400 |
| Std. Residual |
-3.593 |
2.928 |
.000 |
.994 |
400 |
| a Dependent Variable: api 2000
|
These results indicate that the overall interaction is indeed significant. This
means that the regression lines from the three groups differ significantly. As we have done
before, let's the predicted values so we
can see how the regression lines differ.
Because we had three groups, we get three regression lines, one for each category of mealcat.
GRAPH
/LINE(MULTIPLE)MEAN(pre_1) BY some_col BY mealcat.

Group 1 was the omitted group, therefore the slope of the line for group 1 is the
coefficient for some_col which is -.94. Indeed, this line has a downward
slope. If we add the coefficient for some_col to the coefficient for
smc2
we get the coefficient for group 2, i.e., 3.14 + -.94 yields 2.2, the slope for group 2.
Indeed, group 2 shows an upward slope. Likewise, if we add the coefficient for some_col
to the coefficient for smc3 we get the coefficient for group 3, i.e., 2.6 +
-.94 yields 1.66, the slope for group 3. So, the slopes for the 3 groups are
group 1: -0.94
group 2: 2.2
group 3: 1.66
The test of the coefficient for smc2 tested whether the coefficient for
group 2 differed from group 1, and indeed this was significant. Likewise, the test
of the coefficient for smc3 tested whether the coefficient for group 3
differed from group 1, and indeed this was significant. What did the test of the
coefficient some_col test? This coefficient represents the coefficient for
group 1, so this tested whether the coefficient for group 1 (-0.94) was significantly
different from 0. This is probably a non-interesting test.
The comparisons in the above analyses don't seem to be as interesting as comparing
group 1 vs. 2 and then comparing group 2 vs. 3. These successive comparisons seem
much more interesting. We can do this by making group 2 the omitted group, and then
each group would be compared to group 2.
regress
/dep = api00
/method = enter mealcat1 mealcat3 some_col smc1 smc3.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
SMC3, parent some college, MEALCAT1, MEALCAT3, SMC1(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.877(a) |
.769 |
.767 |
68.733 |
| a Predictors: (Constant), SMC3, parent some college, MEALCAT1, MEALCAT3, SMC1
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
6212306.876 |
5 |
1242461.375 |
262.995 |
.000(a) |
| Residual |
1861365.121 |
394 |
4724.277 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), SMC3, parent some college, MEALCAT1, MEALCAT3, SMC1 |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
586.864 |
14.303 |
|
41.030 |
.000 |
| MEALCAT1 |
239.030 |
18.665 |
.790 |
12.806 |
.000 |
| MEALCAT3 |
-105.918 |
18.754 |
-.354 |
-5.648 |
.000 |
| parent some college |
2.194 |
.543 |
.175 |
4.043 |
.000 |
| SMC1 |
-3.141 |
.729 |
-.270 |
-4.307 |
.000 |
| SMC3 |
-.534 |
.927 |
-.030 |
-.576 |
.565 |
| a Dependent Variable: api 2000
|
Now, the test of smc1 tests whether the coefficient for group 1 differs
from group 2, and it does. Then, the test of smc3 tests whether the
coefficient for group 3 significantly differs from group 2, and it does not. This makes
sense given the graph and given the estimates of the coefficients that we have, that -.94
is significantly different from 2.2 but 2.2 is not significantly different from
1.66.
3.8.2 Using glm
We can perform the same analysis using the glm command, as shown below.
The glm command gives us somewhat less flexibility since we cannot choose which group is
the omitted group.
GET FILE='C:\spssreg\elemapi2.sav'.
glm
api00 by mealcat with some_col
/design = some_col mealcat some_col*mealcat
/print = parameter TEST(LMATRIX).
Between-Subjects Factors
|
Value Label |
N |
| Percentage free meals in 3 categories |
1 |
0-46% free meals |
131 |
| 2 |
47-80% free meals |
132 |
| 3 |
81-100% free meals |
137 |
Tests of Between-Subjects Effects
Dependent Variable: api 2000
| Source |
Type III Sum of Squares |
df |
Mean Square |
F |
Sig. |
| Corrected Model |
6212306.876(a) |
5 |
1242461.375 |
262.995 |
.000 |
| Intercept |
34188885.021 |
1 |
34188885.021 |
7236.850 |
.000 |
| SOME_COL |
36366.366 |
1 |
36366.366 |
7.698 |
.006 |
| MEALCAT |
2012065.492 |
2 |
1006032.746 |
212.950 |
.000 |
| MEALCAT * SOME_COL |
97468.169 |
2 |
48734.084 |
10.316 |
.000 |
| Error |
1861365.121 |
394 |
4724.277 |
|
|
| Total |
175839633.000 |
400 |
|
|
|
| Corrected Total |
8073671.997 |
399 |
|
|
|
| a R Squared = .769 (Adjusted R Squared = .767)
|
Parameter Estimates
Dependent Variable: api 2000
|
B |
Std. Error |
t |
Sig. |
95% Confidence Interval |
| Parameter |
Lower Bound |
Upper Bound |
| Intercept |
480.946 |
12.131 |
39.647 |
.000 |
457.097 |
504.795 |
| SOME_COL |
1.660 |
.752 |
2.208 |
.028 |
.182 |
3.138 |
| [MEALCAT=1] |
344.948 |
17.057 |
20.223 |
.000 |
311.413 |
378.483 |
| [MEALCAT=2] |
105.918 |
18.754 |
5.648 |
.000 |
69.046 |
142.789 |
| [MEALCAT=3] |
0(a) |
. |
. |
. |
. |
. |
| [MEALCAT=1] * SOME_COL |
-2.607 |
.896 |
-2.910 |
.004 |
-4.369 |
-.846 |
| [MEALCAT=2] * SOME_COL |
.534 |
.927 |
.576 |
.565 |
-1.289 |
2.357 |
| [MEALCAT=3] * SOME_COL |
0(a) |
. |
. |
. |
. |
. |
| a This parameter is set to zero because it is redundant.
|
<some output omitted to save space>
Because the glm command omits the third category, and the analysis we showed
above omitted the second category, the parameter estimates will not be the same. You can
compare the results from below with the results above and see that the parameter estimates
are not the same. Because group 3 is dropped, that is the reference category and all
comparisons are made with group 3.
These analyses showed that the relationship between some_col and api00
varied, depending on the level of mealcat. In comparing group 1 with group 2,
the coefficient for some_col was significantly different, but there was no
difference in the coefficient for some_col in comparing groups 2 and 3.
3.9 Summary
This chapter covered four techniques for analyzing data with categorical variables, 1)
manually constructing indicator variables, 2) using a do-loop, 3) using the regress
command, and 4) using the glm command.
Each method has its advantages and disadvantages, as described below.
Manually constructing indicator variables can be very tedious and even error prone. For
very simple models, it is not very difficult to create your own indicator variables, but
if you have categorical variables with many levels and/or interactions of categorical
variables, it can be laborious to manually create indicator variables. However, the
advantage is that you can have quite a bit of control over how the variables are created
and the terms that are entered into the model.
A do-loop will allow you to create many indicator variables very quickly.
However, there are some restrictions regarding the naming of the variables and
all of the variables must have the same two values (i.e., zero and one or one
and two).
The regress command is useful when you want to test one or a group of
variables together.
The glm command is useful for those times when you want to use a
particular coding scheme to perform certain types of tests, such as comparing
each level of a variable to the previous level. The glm command in
SPSS will create the appropriate codes for the variables and display the coding
scheme in the output.
3.10 For more information
See the following web pages for more
information and resources on regression with categorical predictors in SPSS.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be
construed as an endorsement of any particular web site, book, or software
product by the University of California.