3.1 Regression with a 0/1 variable
The simplest example of a categorical predictor in a regression analysis is a 0/1
variable, also called a dummy variable. Let's use the variable yr_rnd as
an example of a dummy variable. We can include a dummy variable as a predictor in a
regression analysis as shown below.
GET FILE='C:\spssreg\elemapi2.sav'.
regression
/dep api00
/method = enter yr_rnd.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
year round school(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.475(a) |
.226 |
.224 |
125.300 |
| a Predictors: (Constant), year round school
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
1825000.563 |
1 |
1825000.563 |
116.241 |
.000(a) |
| Residual |
6248671.435 |
398 |
15700.179 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), year round school |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
684.539 |
7.140 |
|
95.878 |
.000 |
| year round school |
-160.506 |
14.887 |
-.475 |
-10.782 |
.000 |
| a Dependent Variable: api 2000
|
This may seem odd at first, but this is a legitimate analysis. But what does this mean?
Let's go back to basics and write out the regression equation that this model implies.
api00 = constant + Byr_rnd * yr_rnd
where constant is the intercept and we use Byr_rnd
to represent the coefficient for variable yr_rnd. Filling in the
values from the regression equation, we get
api00 = 684.539 + -160.5064 * yr_rnd
If a student is not in year-round school (i.e., yr_rnd is 0) the
regression equation would simplify to
api00 = constant + 0 * Byr_rnd
api00 = 684.539 + 0 * -160.5064
api00 = 684.539
If a student is year-round school, the regression equation would simplify to
api00 = constant + 1 * Byr_rnd
api00 = 684.539 + 1 * -160.5064
api00 = 524.0326
We can graph the observed values and the predicted values using the igraph
command as shown below. Although yr_rnd only has 2 values, we can still
draw a regression line showing the relationship between yr_rnd and api00.
Based on the results above, we see that the predicted value for non-year round
schools is 684.539 and the predicted value for the year round schools is
524.032, and the slope of the line is negative, which makes sense since the
coefficient for yr_rnd was negative (-160.5064). Note that the "type = scale"
option is needed here because yr_rnd is an ordinal variable in the
dataset.
IGRAPH
/X1 = VAR(yr_rnd) TYPE = scale
/Y = VAR (api00) TYPE = SCALE
/FITLINE METHOD = REGRESSION LINEAR LINE = TOTAL MEFFECT
/CATORDER VAR(yr_rnd) (ASCENDING VALUES OMITEMPTY)
/SCATTER COINCIDENT = NONE.

Let's compare these predicted values to the mean api00 scores for the
year-round and non-year-round students.
MEANS
TABLES=api00 BY yr_rnd.
Case Processing Summary
|
Cases |
| Included |
Excluded |
Total |
| N |
Percent |
N |
Percent |
N |
Percent |
| api 2000 * year round school |
400 |
100.0% |
0 |
.0% |
400 |
100.0% |
Report
api 2000
| year round school |
Mean |
N |
Std. Deviation |
| No |
684.54 |
308 |
132.113 |
| Yes |
524.03 |
92 |
98.916 |
| Total |
647.62 |
400 |
142.249 |
As you see, the regression equation predicts that the value of api00
will be the mean value of your group, depending on whether you went to year round school
or non-year round school.
Let's relate these predicted values back to the regression equation. For the
non-year-round students, their mean is the same as the intercept (684.539). The
coefficient for yr_rnd is the amount we need to add to get the mean for
the year-round students, i.e., we need to add -160.5064 to get 524.0326, the mean for the
non year-round students. In other words, Byr_rnd is the mean api00
score for the year-round students minus the mean api00 score for the non year-round
students, i.e., mean(year-round) - mean(non year-round).
It may be surprising to note that this regression analysis with a single dummy variable
is the same as doing a t-test comparing the mean api00 for the year-round
students with the non year-round students (see below). You can see that the t-value below
is the same as the t-value for yr_rnd in the regression above. This is because
Byr_rnd compares the non year-rounds and non year-rounds (since
the coefficient is mean(year round)-mean(non year-round)).
T-TEST
GROUPS=yr_rnd(0 1)
/VARIABLES=api00.
Group Statistics
|
year round school |
N |
Mean |
Std. Deviation |
Std. Error Mean |
| api 2000 |
No |
308 |
684.54 |
132.113 |
7.528 |
| Yes |
92 |
524.03 |
98.916 |
10.313 |
Independent Samples Test
|
Levene's Test for Equality of Variances |
t-test for Equality of Means |
| F |
Sig. |
t |
df |
Sig. (2-tailed) |
Mean Difference |
Std. Error Difference |
95% Confidence Interval of the Difference |
| Lower |
Upper |
| api 2000 |
Equal variances assumed |
20.539 |
.000 |
10.782 |
398 |
.000 |
160.51 |
14.887 |
131.239 |
189.774 |
| Equal variances not assumed |
|
|
12.571 |
197.215 |
.000 |
160.51 |
12.768 |
135.327 |
185.686 |
Since a t-test is the same as doing an ANOVA, we can get the same results using the anova
command as well. Note that in SPSS, when you click on "analyze"
and "compare means," you can select a one-way ANOVA test. The
code for conducting a one-way ANOVA is shown below. After this analysis, however,
we will use the glm (for general linear model) command instead of the oneway
command.
ONEWAY
api00 BY yr_rnd.
ANOVA
api 2000
|
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| Between Groups |
1825000.563 |
1 |
1825000.563 |
116.241 |
.000 |
| Within Groups |
6248671.435 |
398 |
15700.179 |
|
|
| Total |
8073671.998 |
399 |
|
|
|
Remember that if you square the t-value, you will get the
F-value: 10.7815**2 = 116.24074 , showing another way in which the t-test
is the same as the ANOVA test.
3.2 Regression with a 1/2 variable
A categorical predictor variable does not have to be coded 0/1 to be used in a
regression model. It is easier to understand and interpret the results from a model with
dummy variables, but the results from a variable coded 1/2 yield essentially the same
results.
Let's make a copy of the variable yr_rnd called yr_rnd2
that is coded 1/2, 1=non year-round and 2=year-round.
compute yr_rnd2 = yr_rnd.
recode yr_rnd2 (0=1) (1=2).
execute.
REGRESSION
/DEPENDENT api00
/METHOD=ENTER yr_rnd2.
<some output omitted to save space>
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
845.045 |
19.353 |
|
43.664 |
.000 |
| YR_RND2 |
-160.506 |
14.887 |
-.475 |
-10.782 |
.000 |
| a Dependent Variable: api 2000
|
Note that the coefficient for yr_rnd is the same as yr_rnd2.
So, you can see that if you code yr_rnd as 0/1 or as 1/2, the regression
coefficient works out to be the same. However the intercept is a bit less
intuitive. When we used yr_rnd, the intercept was the mean for the non
year-rounds. When using yr_rnd2, the intercept is the mean for the non
year-rounds minus Byr_rnd2, i.e., 684.539 - (-160.506) = 845.045
Note that you can use 0/1 or 1/2 coding and the results for the coefficient come out
the same, but the interpretation of constant in the regression equation is different. It
is often easier to interpret the estimates for 0/1 coding.
In summary, these results indicate that the api00 scores are
significantly different for the students depending on the type of school they attend, year
round school vs. non-year round school. Those who attend non-year round school have
significantly higher scores. Based on the regression results, those who attend non-year
round schools have scores that are 160.5 points higher than those who attend year-round
schools.
3.3 Regression with a 1/2/3 variable
3.3.1 Manually Creating Dummy Variables
Say that we would like to examine the relationship between the amount of poverty and
api scores. We don't have a measure of poverty, but we can use mealcat as
a proxy for a measure of poverty. You might be tempted to try including mealcat in a regression like this.
regression
/dependent api00
/method=enter mealcat.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
Percentage free meals in 3 categories(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.867(a) |
.752 |
.752 |
70.908 |
| a Predictors: (Constant), Percentage free meals in 3 categories
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
6072527.519 |
1 |
6072527.519 |
1207.742 |
.000(a) |
| Residual |
2001144.479 |
398 |
5028.001 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), Percentage free meals in 3 categories |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
950.987 |
9.422 |
|
100.935 |
.000 |
| Percentage free meals in 3 categories |
-150.553 |
4.332 |
-.867 |
-34.753 |
.000 |
| a Dependent Variable: api 2000
|
This is looking at the linear effect of mealcat with api00,
but mealcat is not an interval variable. Instead, you will want to code the variable so
that all the information concerning the three levels is accounted for.
You can dummy code mealcat like this.
compute mealcat1 = 0.
if mealcat = 1 mealcat1 = 1.
compute mealcat2 = 0.
if mealcat = 2 mealcat2 = 1.
compute mealcat3 = 0.
if mealcat = 3 mealcat3 = 1.
execute.
We now have created mealcat1 that is 1 if mealcat is
1, and 0 otherwise. Likewise, mealcat2 is 1 if mealcat
is 2, and 0 otherwise; and likewise mealcat3 was created. We can see this
below.
list mealcat mealcat1 mealcat2 mealcat3
/cases from 1 to 10.
MEALCAT MEALCAT1 MEALCAT2 MEALCAT3
2 .00 1.00 .00
3 .00 .00 1.00
3 .00 .00 1.00
3 .00 .00 1.00
3 .00 .00 1.00
1 1.00 .00 .00
1 1.00 .00 .00
1 1.00 .00 .00
1 1.00 .00 .00
1 1.00 .00 .00
Number of cases read: 10 Number of cases listed: 10
We can now use two of these dummy variables (mealcat2 and mealcat3)
in the regression analysis.
regression
/dependent api00
/method = enter mealcat2 mealcat3.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
MEALCAT3, MEALCAT2(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.869(a) |
.755 |
.754 |
70.612 |
| a Predictors: (Constant), MEALCAT3, MEALCAT2
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000(a) |
| Residual |
1979474.328 |
397 |
4986.081 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), MEALCAT3, MEALCAT2 |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
805.718 |
6.169 |
|
130.599 |
.000 |
| MEALCAT2 |
-166.324 |
8.708 |
-.550 |
-19.099 |
.000 |
| MEALCAT3 |
-301.338 |
8.629 |
-1.007 |
-34.922 |
.000 |
| a Dependent Variable: api 2000
|
We can test the overall differences among the three groups by using the /method
= test statement as
shown below. This shows that the overall differences among the three groups are
significant, with an F value of 611.121 and a p value of .000.
regression
/dependent api00
/method = test (mealcat2 mealcat3).
Variables Entered/Removed(a)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
MEALCAT3, MEALCAT2 |
. |
Test |
| a Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.869(a) |
.755 |
.754 |
70.612 |
| a Predictors: (Constant), MEALCAT3, MEALCAT2
|
ANOVA(c)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
R Square Change |
| 1 |
Subset Tests |
MEALCAT2, MEALCAT3 |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000(a) |
.755 |
| Regression |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000(b) |
|
| Residual |
1979474.328 |
397 |
4986.081 |
|
|
|
| Total |
8073671.997 |
399 |
|
|
|
|
| a Tested against the full model. |
| b Predictors in the Full Model: (Constant), MEALCAT3, MEALCAT2. |
| c Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
805.718 |
6.169 |
|
130.599 |
.000 |
| MEALCAT2 |
-166.324 |
8.708 |
-.550 |
-19.099 |
.000 |
| MEALCAT3 |
-301.338 |
8.629 |
-1.007 |
-34.922 |
.000 |
| a Dependent Variable: api 2000
|
The interpretation of the coefficients is much like that for the binary variables. Group 1 is
the omitted group, so the constant is the mean for group 1. The coefficient for mealcat2
is the mean for group 2 minus the mean of the omitted group (group 1), and the coefficient for
mealcat3
is the mean of group 3 minus the mean of group 1. You can verify this by comparing the
coefficients with the means of the groups, shown below.
MEANS
TABLES=api00 BY mealcat.
Case Processing Summary
|
Cases |
| Included |
Excluded |
Total |
| N |
Percent |
N |
Percent |
N |
Percent |
| api 2000 * Percentage free meals in 3 categories |
400 |
100.0% |
0 |
.0% |
400 |
100.0% |
Report
api 2000
| Percentage free meals in 3 categories |
Mean |
N |
Std. Deviation |
| 0-46% free meals |
805.72 |
131 |
65.669 |
| 47-80% free meals |
639.39 |
132 |
82.135 |
| 81-100% free meals |
504.38 |
137 |
62.727 |
| Total |
647.62 |
400 |
142.249 |
Based on these results, we can say that the three groups differ in their api00
scores, and that in particular group2 is significantly different from group1 (because mealcat2
was significant) and group 3 is significantly different from group 1 (because mealcat3
was significant).
3.3.2 Using Do Loops
We can use the do repeat command to do the work for us to create the indicator
(dummy) variables. This method is particularly useful when you need to create many
indicator variables.
DO REPEAT A=mealcat1 mealcat2 mealcat3
/B=1 2 3.
COMPUTE A=(mealcat=B).
END REPEAT.
We will then do a crosstab to verify that our indicator variables were created
correctly.
crosstab /tables = mealcat by mealcat1
/tables = mealcat by mealcat2
/tables = mealcat by mealcat3.
Case Processing Summary
|
Cases |
| Valid |
Missing |
Total |
| N |
Percent |
N |
Percent |
N |
Percent |
| Percentage free meals in 3 categories * MEALCAT1 |
400 |
100.0% |
0 |
.0% |
400 |
100.0% |
| Percentage free meals in 3 categories * MEALCAT2 |
400 |
100.0% |
0 |
.0% |
400 |
100.0% |
| Percentage free meals in 3 categories * MEALCAT3 |
400 |
100.0% |
0 |
.0% |
400 |
100.0% |
Percentage free meals in 3 categories * MEALCAT1 Crosstabulation
Count
|
MEALCAT1 |
Total |
| .00 |
1.00 |
| Percentage free meals in 3 categories |
0-46% free meals |
|
131 |
131 |
| 47-80% free meals |
132 |
|
132 |
| 81-100% free meals |
137 |
|
137 |
| Total |
269 |
131 |
400 |
Percentage free meals in 3 categories * MEALCAT2 Crosstabulation
Count
|
MEALCAT2 |
Total |
| .00 |
1.00 |
| Percentage free meals in 3 categories |
0-46% free meals |
131 |
|
131 |
| 47-80% free meals |
|
132 |
132 |
| 81-100% free meals |
137 |
|
137 |
| Total |
268 |
132 |
400 |
Percentage free meals in 3 categories * MEALCAT3 Crosstabulation
Count
|
MEALCAT3 |
Total |
| .00 |
1.00 |
| Percentage free meals in 3 categories |
0-46% free meals |
131 |
|
131 |
| 47-80% free meals |
132 |
|
132 |
| 81-100% free meals |
|
137 |
137 |
| Total |
263 |
137 |
400 |
What if we wanted a different group to be the reference group? For example, let's omit group 3.
regression
/dependent api00
/method = enter mealcat1 mealcat2.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
MEALCAT2, MEALCAT1(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.869(a) |
.755 |
.754 |
70.612 |
| a Predictors: (Constant), MEALCAT2, MEALCAT1
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000(a) |
| Residual |
1979474.328 |
397 |
4986.081 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), MEALCAT2, MEALCAT1 |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
504.380 |
6.033 |
|
83.606 |
.000 |
| MEALCAT1 |
301.338 |
8.629 |
.995 |
34.922 |
.000 |
| MEALCAT2 |
135.014 |
8.612 |
.447 |
15.677 |
.000 |
| a Dependent Variable: api 2000
|
With group 3 omitted, the constant is now the mean of group 3 and mealcat1
is group1-group3 and mealcat2 is group2-group3. We see that both of
these coefficients are significant, indicating that group 1 is significantly different from
group 3 and group 2 is significantly different from group 3.
3.3.3 Using the glm command
We can also do this analysis using the glm command. The benefit of
the glm command is that it we don't need to manually create dummy
varaibles, and it gives us the test of the overall effect of mealcat
without needing to subsequently use the /method = test statement as we did with the regress
command.
glm
api00 by mealcat.
Between-Subjects Factors
|
Value Label |
N |
| Percentage free meals in 3 categories |
1 |
0-46% free meals |
131 |
| 2 |
47-80% free meals |
132 |
| 3 |
81-100% free meals |
137 |
Tests of Between-Subjects Effects
Dependent Variable: api 2000
| Source |
Type III Sum of Squares |
df |
Mean Square |
F |
Sig. |
| Corrected Model |
6094197.670(a) |
2 |
3047098.835 |
611.121 |
.000 |
| Intercept |
168847142.059 |
1 |
168847142.059 |
33863.695 |
.000 |
| MEALCAT |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000 |
| Error |
1979474.328 |
397 |
4986.081 |
|
|
| Total |
175839633.000 |
400 |
|
|
|
| Corrected Total |
8073671.997 |
399 |
|
|
|
| a R Squared = .755 (Adjusted R Squared = .754)
|
We can use the /print=parameter statement with the glm
command to obtain the parameter estimates. Note that the estimates are
based on dummy coding with the last (third) category omitted, and correspond to
the results shown above where the third category was omitted.
glm
api00 by mealcat
/print=parameter.
Between-Subjects Factors
|
Value Label |
N |
| Percentage free meals in 3 categories |
1 |
0-46% free meals |
131 |
| 2 |
47-80% free meals |
132 |
| 3 |
81-100% free meals |
137 |
Tests of Between-Subjects Effects
Dependent Variable: api 2000
| Source |
Type III Sum of Squares |
df |
Mean Square |
F |
Sig. |
| Corrected Model |
6094197.670(a) |
2 |
3047098.835 |
611.121 |
.000 |
| Intercept |
168847142.059 |
1 |
168847142.059 |
33863.695 |
.000 |
| MEALCAT |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000 |
| Error |
1979474.328 |
397 |
4986.081 |
|
|
| Total |
175839633.000 |
400 |
|
|
|
| Corrected Total |
8073671.997 |
399 |
|
|
|
| a R Squared = .755 (Adjusted R Squared = .754)
|
Parameter Estimates
Dependent Variable: api 2000
|
B |
Std. Error |
t |
Sig. |
95% Confidence Interval |
| Parameter |
Lower Bound |
Upper Bound |
| Intercept |
504.380 |
6.033 |
83.606 |
.000 |
492.519 |
516.240 |
| [MEALCAT=1] |
301.338 |
8.629 |
34.922 |
.000 |
284.374 |
318.302 |
| [MEALCAT=2] |
135.014 |
8.612 |
15.677 |
.000 |
118.083 |
151.945 |
| [MEALCAT=3] |
0(a) |
. |
. |
. |
. |
. |
| a This parameter is set to zero because it is redundant.
|
Note that the parameter estimates are the same because mealcat is coded
the same way in the regress command and in the glm command, because in both cases the last category (category 3) is being dropped.
3.3.4 Other coding schemes
It is generally very convenient to use dummy coding, but that is not the only kind of
coding that can be used. As you have seen, when you use dummy coding one of the groups
becomes the reference group and all of the other groups are compared to that group. This
may not be the most interesting set of comparisons. Below is a list of the
types of coding schemes that SPSS will create for you. You can access
these through the pull-down menus, or you can request it on the /CONTRAST
statement when using GLM (described later). First, we show you how to
manually create the codes.
Deviation(refcat): The deviations from the grand mean.
Difference: The difference or reverse Helmert contrast - compare levels of a factor with the mean of the previous levels of the factor.
Simple(refcat): Compare each level of a factor to the last level.
Helmert: Compare levels of a factor with the mean of the subsequent levels of
the factor.
Polynomial: Orthogonal polynomial contrasts.
Repeated: Adjacent levels of a factor.
Special: A user-defined contrast.
Let's create a variable that compares group 1 with 2 and another variable that compares
group 2 with 3, and include those variables in the regression model. In
other words, we wish to create coefficients are comparisons of successive groups with group 1
as the baseline comparison group (i.e., the first comparison comparing group 1 vs. 2, and
the second comparison comparing groups 2 vs. 3). Below we show how to
manually generate
a coding scheme that forms these 2 comparisons.
if mealcat = 1 grp1 = .667.
if mealcat = 2 grp1 = -.333.
if mealcat = 3 grp1 = -.333.
if mealcat = 1 grp2 = .333.
if mealcat = 2 grp2 = .333.
if mealcat = 3 grp2 = -.667.
execute.
regression
/dep = api00
/method = enter grp1 grp2.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
GRP2, GRP1(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api
2000 |
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.869(a) |
.755 |
.754 |
70.612 |
| a Predictors: (Constant), GRP2, GRP1
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000(a) |
| Residual |
1979474.328 |
397 |
4986.081 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), GRP2, GRP1 |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
649.820 |
3.531 |
|
184.016 |
.000 |
| GRP1 |
166.324 |
8.708 |
.549 |
19.099 |
.000 |
| GRP2 |
135.014 |
8.612 |
.451 |
15.677 |
.000 |
| a Dependent Variable: api 2000
|
We can perform this same series of comparisions much easier using the glm command with the contrast statement.
glm
api00 by mealcat
/contrast (mealcat)=repeated
/print = parameter TEST(LMATRIX).
Between-Subjects Factors
|
Value Label |
N |
| Percentage free meals in 3 categories |
1 |
0-46% free meals |
131 |
| 2 |
47-80% free meals |
132 |
| 3 |
81-100% free meals |
137 |
Tests of Between-Subjects Effects
Dependent Variable: api 2000
| Source |
Type III Sum of Squares |
df |
Mean Square |
F |
Sig. |
| Corrected Model |
6094197.670(a) |
2 |
3047098.835 |
611.121 |
.000 |
| Intercept |
168847142.059 |
1 |
168847142.059 |
33863.695 |
.000 |
| MEALCAT |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000 |
| Error |
1979474.328 |
397 |
4986.081 |
|
|
| Total |
175839633.000 |
400 |
|
|
|
| Corrected Total |
8073671.997 |
399 |
|
|
|
| a R Squared = .755 (Adjusted R Squared = .754)
|
Parameter Estimates
Dependent Variable: api 2000
|
B |
Std. Error |
t |
Sig. |
95% Confidence Interval |
| Parameter |
Lower Bound |
Upper Bound |
| Intercept |
504.380 |
6.033 |
83.606 |
.000 |
492.519 |
516.240 |
| [MEALCAT=1] |
301.338 |
8.629 |
34.922 |
.000 |
284.374 |
318.302 |
| [MEALCAT=2] |
135.014 |
8.612 |
15.677 |
.000 |
118.083 |
151.945 |
| [MEALCAT=3] |
0(a) |
. |
. |
. |
. |
. |
| a This parameter is set to zero because it is redundant.
|
Intercept
|
Contrast |
| Parameter |
L1 |
| Intercept |
1.000 |
| [MEALCAT=1] |
.333 |
| [MEALCAT=2] |
.333 |
| [MEALCAT=3] |
.333 |
The default display of this matrix is the transpose of the corresponding L matrix. Based on Type III Sums of Squares.
|
MEALCAT
|
Contrast |
| Parameter |
L2 |
L3 |
| Intercept |
0 |
0 |
| [MEALCAT=1] |
1 |
0 |
| [MEALCAT=2] |
0 |
1 |
| [MEALCAT=3] |
-1 |
-1 |
The default display of this matrix is the transpose of the corresponding L matrix. Based on Type III Sums of Squares.
|
Contrast Coefficients (L' Matrix)
|
Percentage free meals in 3 categories Repeated Contrast |
| Parameter |
Level 1 vs. Level 2 |
Level 2 vs. Level 3 |
| Intercept |
0 |
0 |
| [MEALCAT=1] |
1 |
0 |
| [MEALCAT=2] |
-1 |
1 |
| [MEALCAT=3] |
0 |
-1 |
| The default display of this matrix is the transpose of the corresponding L matrix.
|
Contrast Results (K Matrix)
|
Dependent Variable |
| Percentage free meals in 3 categories Repeated Contrast |
api 2000 |
| Level 1 vs. Level 2 |
Contrast Estimate |
166.324 |
| Hypothesized Value |
0 |
| Difference (Estimate - Hypothesized) |
166.324 |
| Std. Error |
8.708 |
| Sig. |
.000 |
| 95% Confidence Interval for Difference |
Lower Bound |
149.203 |
| Upper Bound |
183.444 |
| Level 2 vs. Level 3 |
Contrast Estimate |
135.014 |
| Hypothesized Value |
0 |
| Difference (Estimate - Hypothesized) |
135.014 |
| Std. Error |
8.612 |
| Sig. |
.000 |
| 95% Confidence Interval for Difference |
Lower Bound |
118.083 |
| Upper Bound |
151.945 |
Test Results
Dependent Variable: api 2000
| Source |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| Contrast |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000 |
| Error |
1979474.328 |
397 |
4986.081 |
|
|
If you compare the parameter estimates with the means you can verify that B1
(i.e., 0-46% free meals) is the mean of group 1 minus group 2, and B2
(i.e., 47-80% free meals) is the mean of group 2 minus group 3. Both of these
comparisons are significant, indicating that group 1 significantly differs from group 2,
and group 2 significantly differs from group 3.
MEANS
TABLES=api00 BY mealcat.
Case Processing Summary
|
Cases |
| Included |
Excluded |
Total |
| N |
Percent |
N |
Percent |
N |
Percent |
| api 2000 * Percentage free meals in 3 categories |
400 |
100.0% |
0 |
.0% |
400 |
100.0% |
Report
api 2000
| Percentage free meals in 3 categories |
Mean |
N |
Std. Deviation |
| 0-46% free meals |
805.72 |
131 |
65.669 |
| 47-80% free meals |
639.39 |
132 |
82.135 |
| 81-100% free meals |
504.38 |
137 |
62.727 |
| Total |
647.62 |
400 |
142.249 |
3.4 Regression with two categorical predictors
Previously we looked at using yr_rnd to predict api00
And we have also looked at mealcat using the regression command
regression
/dep api00
/method = enter mealcat1 mealcat2.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
MEALCAT2, MEALCAT1(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: api 2000
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.869(a) |
.755 |
.754 |
70.612 |
| a Predictors: (Constant), MEALCAT2, MEALCAT1
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
6094197.670 |
2 |
3047098.835 |
611.121 |
.000(a) |
| Residual |
1979474.328 |
397 |
4986.081 |
|
|
| Total |
8073671.997 |
399 |
|
|
|
| a Predictors: (Constant), MEALCAT2, MEALCAT1 |
| b Dependent Variable: api 2000
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
504.380 |
6.033 |
|
83.606 |
.000 |
| MEALCAT1 |
301.338 |
8.629 |
.995 |
34.922 |
.000 |
| MEALCAT2 |
|
|---|