SPSS Textbook Examples
Applied Regression Analysis by John Fox
Chapter 11: Unusual and influential data
page 269 The regressions at the bottom of the page.
GET FILE='D:\davis.sav'.
For the uncorrected data:
compute measf = measwt*female.
execute.
regression
/dep=reptwt
/method=enter measwt female measf.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
MEASF, Measured Weight, Gender, 0=male, 1=female(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: Reported Weight
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.942(a) |
.887 |
.886 |
4.661 |
| a Predictors: (Constant), MEASF, Measured Weight, Gender, 0=male, 1=female
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
30654.729 |
3 |
10218.243 |
470.408 |
.000(a) |
| Residual |
3888.254 |
179 |
21.722 |
|
|
| Total |
34542.984 |
182 |
|
|
|
| a Predictors: (Constant), MEASF, Measured Weight, Gender, 0=male, 1=female |
| b Dependent Variable: Reported Weight
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
1.359 |
3.277 |
|
.415 |
.679 |
| Measured Weight |
.990 |
.043 |
1.099 |
23.236 |
.000 |
| Gender, 0=male, 1=female |
39.964 |
3.929 |
1.447 |
10.171 |
.000 |
| MEASF |
-.725 |
.056 |
-1.611 |
-12.957 |
.000 |
| a Dependent Variable: Reported Weight
|
For the corrected data:
compute nmwt = measwt.
if subject=12 nmwt=57.
if subject=12 measht=166.
compute measwtf = nmwt*female.
execute.
regression
/dep=reptwt
/method=enter nmwt female measwtf.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
MEASWTF, NMWT, Gender, 0=male, 1=female(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: Reported Weight
|
Model Summary
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.987(a) |
.974 |
.973 |
2.243 |
| a Predictors: (Constant), MEASWTF, NMWT, Gender, 0=male, 1=female
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
33642.345 |
3 |
11214.115 |
2228.780 |
.000(a) |
| Residual |
900.639 |
179 |
5.032 |
|
|
| Total |
34542.984 |
182 |
|
|
|
| a Predictors: (Constant), MEASWTF, NMWT, Gender, 0=male, 1=female |
| b Dependent Variable: Reported Weight
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
1.359 |
1.577 |
|
.861 |
.390 |
| NMWT |
.990 |
.021 |
.962 |
48.279 |
.000 |
| Gender, 0=male, 1=female |
1.983 |
2.450 |
.072 |
.809 |
.420 |
| MEASWTF |
-5.668E-02 |
.038 |
-.119 |
-1.474 |
.142 |
| a Dependent Variable: Reported Weight
|
page 270 Figure 11.2 Davis's data on reported and measured weight
for women (F) and men (M), showing the least-squares linear regression line for each group (the broken line for men, the solid line for women). The outlying observation has a substantial effect on the fitted line
for women.
NOTE: We do not know how to get SPSS
to plot two regression lines on the same graph, so the graphs have been done separately.
USE ALL.
COMPUTE filter_$=(female=1).
VARIABLE LABEL filter_$ 'female=1 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.
IGRAPH
/X1 = VAR(measwt)
/Y = VAR(reptwt)
/FITLINE METHOD = REGRESSION LINEAR LINE = TOTAL MEFFECT
/SCATTER COINCIDENT = NONE.
USE ALL.
COMPUTE filter_$=(female=0).
VARIABLE LABEL filter_$ 'female=1 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.
IGRAPH
/X1 = VAR(measwt)
/Y = VAR(reptwt)
/FITLINE METHOD = REGRESSION LINEAR LINE = TOTAL MEFFECT
/SCATTER COINCIDENT = NONE.
use all.
page 271 The largest hat value (middle of the page).
NOTE: Various statistics, including DFBETAs, studentized
residuals and covariance ratios, are discussed in this section. Those statistics have been calculated and displayed
by the regression command below.
GET FILE='D:\davis.sav'.
compute rptfem = reptwt*female.
execute.
regression
/dep=measwt
/method=enter reptwt female rptfem
/save lev sres dfbeta covratio.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
RPTFEM, Reported Weight, Gender, 0=male, 1=female(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: Measured Weight
|
Model Summary(b)
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.837(a) |
.700 |
.695 |
8.449 |
| a Predictors: (Constant), RPTFEM, Reported Weight, Gender, 0=male, 1=female |
| b Dependent Variable: Measured Weight
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
29786.378 |
3 |
9928.793 |
139.071 |
.000(a) |
| Residual |
12779.436 |
179 |
71.393 |
|
|
| Total |
42565.814 |
182 |
|
|
|
| a Predictors: (Constant), RPTFEM, Reported Weight, Gender, 0=male, 1=female |
| b Dependent Variable: Measured Weight
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
1.794 |
5.924 |
|
.303 |
.762 |
| Reported Weight |
.969 |
.076 |
.873 |
12.681 |
.000 |
| Gender, 0=male, 1=female |
2.074 |
9.297 |
.068 |
.223 |
.824 |
| RPTFEM |
-9.525E-03 |
.147 |
-.018 |
-.065 |
.948 |
| a Dependent Variable: Measured Weight
|
Casewise Diagnostics(a)
| Case Number |
Std. Residual |
Measured Weight |
| 12 |
12.830 |
166 |
| a Dependent Variable: Measured Weight
|
Residuals Statistics(a)
|
Minimum |
Maximum |
Mean |
Std. Deviation |
N |
| Predicted Value |
43.20 |
121.94 |
66.22 |
12.793 |
183 |
| Std. Predicted Value |
-1.799 |
4.355 |
.000 |
1.000 |
183 |
| Standard Error of Predicted Value |
.841 |
3.743 |
1.183 |
.402 |
183 |
| Adjusted Predicted Value |
43.49 |
122.66 |
66.24 |
12.811 |
183 |
| Residual |
-7.66 |
108.41 |
.00 |
8.380 |
183 |
| Std. Residual |
-.906 |
12.830 |
.000 |
.992 |
183 |
| Stud. Residual |
-.935 |
12.895 |
-.001 |
.997 |
183 |
| Deleted Residual |
-8.15 |
109.50 |
-.01 |
8.475 |
183 |
| Stud. Deleted Residual |
-.935 |
48.221 |
.192 |
3.580 |
183 |
| Mahal. Distance |
.810 |
34.720 |
2.984 |
3.619 |
183 |
| Cook's Distance |
.000 |
.421 |
.003 |
.031 |
183 |
| Centered Leverage Value |
.004 |
.191 |
.016 |
.020 |
183 |
| a Dependent Variable: Measured Weight
|
NOTE: SPSS values for the leverage do not exactly match those
obtained by Fox.
compute measf = measwt*female.
execute.
regression
/dep=reptwt
/method=enter measwt female measf
/save lev.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
MEASF, Measured Weight, Gender, 0=male, 1=female(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: Reported Weight
|
Model Summary(b)
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.942(a) |
.887 |
.886 |
4.661 |
| a Predictors: (Constant), MEASF, Measured Weight, Gender, 0=male, 1=female |
| b Dependent Variable: Reported Weight
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
30654.729 |
3 |
10218.243 |
470.408 |
.000(a) |
| Residual |
3888.254 |
179 |
21.722 |
|
|
| Total |
34542.984 |
182 |
|
|
|
| a Predictors: (Constant), MEASF, Measured Weight, Gender, 0=male, 1=female |
| b Dependent Variable: Reported Weight
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
1.359 |
3.277 |
|
.415 |
.679 |
| Measured Weight |
.990 |
.043 |
1.099 |
23.236 |
.000 |
| Gender, 0=male, 1=female |
39.964 |
3.929 |
1.447 |
10.171 |
.000 |
| MEASF |
-.725 |
.056 |
-1.611 |
-12.957 |
.000 |
| a Dependent Variable: Reported Weight
|
Casewise Diagnostics(a)
| Case Number |
Std. Residual |
Reported Weight |
| 12 |
-6.270 |
56 |
| 115 |
3.342 |
77 |
| a Dependent Variable: Reported Weight
|
Residuals Statistics(a)
|
Minimum |
Maximum |
Mean |
Std. Deviation |
N |
| Predicted Value |
51.64 |
119.15 |
65.62 |
12.978 |
183 |
| Std. Predicted Value |
-1.078 |
4.124 |
.000 |
1.000 |
183 |
| Standard Error of Predicted Value |
.464 |
3.939 |
.618 |
.306 |
183 |
| Adjusted Predicted Value |
51.99 |
158.24 |
66.01 |
14.583 |
183 |
| Residual |
-29.22 |
15.58 |
.00 |
4.622 |
183 |
| Std. Residual |
-6.270 |
3.342 |
.000 |
.992 |
183 |
| Stud. Residual |
-11.728 |
3.392 |
-.029 |
1.241 |
183 |
| Deleted Residual |
-102.24 |
16.04 |
-.39 |
8.643 |
183 |
| Stud. Deleted Residual |
-24.304 |
3.497 |
-.096 |
2.008 |
183 |
| Mahal. Distance |
.808 |
128.987 |
2.984 |
9.774 |
183 |
| Cook's Distance |
.000 |
85.927 |
.474 |
6.352 |
183 |
| Centered Leverage Value |
.004 |
.709 |
.016 |
.054 |
183 |
| a Dependent Variable: Reported Weight
|
page 284 Figure 11.5 Partial-regression plots for Duncan's regression
of occupational prestige on the income (a) and educational levels (b) of 45 U.S. occupations. in 1950. Three potentially influential observations
(ministers, railroad conductors, and railroad engineers) are identified on the plots. The partial-regression plot for the intercept A is not shown.
NOTE: In order to get the regression line on the plot, you need to use
the code below, double-click on the resulting graph, select chart from the menu
at the top, select options, click on fit line total, select fit options, select
regression, click on continue and OK. We do not know how to add the regression line using code.
GET FILE='D:\duncan.sav'.
regression
/dep=prestige
/method=enter income educ
/partialplot all
/sav sresid lev cook.
Variables Entered/Removed(b)
| Model |
Variables Entered |
Variables Removed |
Method |
| 1 |
Percent of males in occupation in 1950 who were high-school graduates, Percent of males in occupation earning $3500 or more in 1950(a) |
. |
Enter |
| a All requested variables entered. |
| b Dependent Variable: Percent of raters in NORC study rating occupation as excellent or good in presti
|
Model Summary(b)
| Model |
R |
R Square |
Adjusted R Square |
Std. Error of the Estimate |
| 1 |
.910(a) |
.828 |
.820 |
13.369 |
| a Predictors: (Constant), Percent of males in occupation in 1950 who were high-school graduates, Percent of males in occupation earning $3500 or more in 1950 |
| b Dependent Variable: Percent of raters in NORC study rating occupation as excellent or good in presti
|
ANOVA(b)
| Model |
Sum of Squares |
df |
Mean Square |
F |
Sig. |
| 1 |
Regression |
36180.946 |
2 |
18090.473 |
101.216 |
.000(a) |
| Residual |
7506.699 |
42 |
178.731 |
|
|
| Total |
43687.644 |
44 |
|
|
|
| a Predictors: (Constant), Percent of males in occupation in 1950 who were high-school graduates, Percent of males in occupation earning $3500 or more in 1950 |
| b Dependent Variable: Percent of raters in NORC study rating occupation as excellent or good in presti
|
Coefficients(a)
|
Unstandardized Coefficients |
Standardized Coefficients |
t |
Sig. |
| Model |
B |
Std. Error |
Beta |
| 1 |
(Constant) |
-6.065 |
4.272 |
|
-1.420 |
.163 |
| Percent of males in occupation earning $3500 or more in 1950 |
.599 |
.120 |
.464 |
5.003 |
.000 |
| Percent of males in occupation in 1950 who were high-school graduates |
.546 |
.098 |
.516 |
5.555 |
.000 |
| a Dependent Variable: Percent of raters in NORC study rating occupation as excellent or good in presti
|
Residuals Statistics(a)
|
Minimum |
Maximum |
Mean |
Std. Deviation |
N |
| Predicted Value |
1.95 |
96.42 |
47.69 |
28.676 |
45 |
| Std. Predicted Value |
-1.595 |
1.699 |
.000 |
1.000 |
45 |
| Standard Error of Predicted Value |
2.077 |
6.935 |
3.334 |
.903 |
45 |
| Adjusted Predicted Value |
.81 |
97.04 |
47.59 |
28.701 |
45 |
| Residual |
-29.54 |
34.64 |
.00 |
13.062 |
45 |
| Std. Residual |
-2.209 |
2.591 |
.000 |
.977 |
45 |
| Stud. Residual |
-2.272 |
2.849 |
.003 |
1.019 |
45 |
| Deleted Residual |
-31.24 |
41.89 |
.10 |
14.249 |
45 |
| Stud. Deleted Residual |
-2.397 |
3.135 |
.007 |
1.056 |
45 |
| Mahal. Distance |
.084 |
10.862 |
1.956 |
1.928 |
45 |
| Cook's Distance |
.000 |
.566 |
.032 |
.090 |
45 |
| Centered Leverage Value |
.002 |
.247 |
.044 |
.044 |
45 |
| a Dependent Variable: Percent of raters in NORC study rating occupation as excellent or good in presti
|
page 285 Figure 11.6 "Bubble plot" of Cook's D, studentized residuals,
and hat values, for Duncan's regression of occupational prestige on income and education. Each point is plotted as a circle with area proportional
to D. Horizontal reference lines are drawn at studentized residuals of 0 and +-2; vertical reference lines are drawn at values of 2h and 3h. Several observations are identified on the plot: Ministers and conductors
have large hat values and relatively large residuals; reporters have a relatively large residual, but a small hat value; railroad engineers have
a large hat value, but a small residual.
NOTE: A
proportional bubble plot can be made with the igraph command.
After running the command, you will need to
double-click on the graph to open the chart editor and then make several
modifications. We show below a regular scatterplot, which will suffice
in many instances, and then the proportional bubble plot.
GRAPH
/SCATTERPLOT(BIVAR)=lev_1 WITH sre_1.
IGRAPH
/X1 = VAR(lev_1)
/Y = VAR(sre_1)
/SIZE = VAR(coo_1) TYPE = SCALE
/SCALERANGE = VAR(sre_1) MIN=-2.500000 MAX=5.000000
/SCATTER COINCIDENT = NONE.

page 288 Table 11.1 Data on the 1907 Romanian Peasant Rebellion:
I, intensity of the rebellion (corrected from the original); C, commercialization of agriculture; T, traditionalism; M, market forces;
and G, inequality of land tenure.
GET FILE='D:\chirot.sav'.
list county rebel agric trad market inequal.
COUNTY REBEL AGRIC TRAD MARKET INEQUAL
1 -1.39 13.80 86.20 6.20 .60
2 .65 20.40 86.70 2.90 .72
3 1.89 27.60 79.30 16.90 .66
4 -.15 18.60 90.10 3.40 .74
5 -.86 17.20 84.50 9.00 .70
6 .11 21.50 81.50 5.20 .60
7 -.51 11.60 82.60 5.10 .52
8 -.86 20.40 82.40 6.30 .64
9 -.24 19.50 87.50 4.80 .68
10 -.77 8.90 85.60 9.50 .58
11 -.24 25.80 82.20 10.90 .68
12 -1.57 24.10 83.50 8.40 .74
13 -.51 22.00 88.30 6.20 .70
14 -1.57 24.20 84.90 6.10 .62
15 -.51 30.60 76.10 1.30 .76
16 -1.13 33.90 85.50 5.80 .70
17 -1.22 28.60 84.20 2.90 .58
18 -1.22 36.50 78.10 4.30 .72
19 -.86 40.90 84.40 2.30 .64
20 -1.39 6.80 76.30 3.60 .58
21 2.81 41.90 89.70 6.60 .66
22 -1.04 25.40 83.20 2.50 .68
23 1.57 30.50 80.20 4.10 .76
24 4.32 48.20 91.00 4.20 .70
25 3.79 46.00 90.50 3.70 .68
26 3.79 45.10 85.50 5.10 .64
27 -1.75 12.50 83.80 7.20 .50
28 .82 39.30 85.60 4.90 .60
29 2.59 47.70 87.60 5.20 .58
30 -.86 15.20 87.30 10.80 .42
31 -1.84 11.70 82.30 81.70 .42
32 -1.84 25.60 80.10 68.40 .26
Number of cases read: 32 Number of cases listed: 32
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be
construed as an endorsement of any particular web site, book, or software
product by the University of California