UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SPSS Textbook Examples
Applied Regression Analysis by John Fox
Chapter 11: Unusual and influential data

page 269 The regressions at the bottom of the page.

GET FILE='D:\davis.sav'.

For the uncorrected data:

compute measf = measwt*female.
execute.

regression 
 /dep=reptwt 
 /method=enter measwt female measf. 
Variables Entered/Removed(b)
Model Variables Entered Variables Removed Method
1 MEASF, Measured Weight, Gender, 0=male, 1=female(a) . Enter
a All requested variables entered.
b Dependent Variable: Reported Weight

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .942(a) .887 .886 4.661
a Predictors: (Constant), MEASF, Measured Weight, Gender, 0=male, 1=female

ANOVA(b)
Model Sum of Squares df Mean Square F Sig.
1 Regression 30654.729 3 10218.243 470.408 .000(a)
Residual 3888.254 179 21.722

Total 34542.984 182


a Predictors: (Constant), MEASF, Measured Weight, Gender, 0=male, 1=female
b Dependent Variable: Reported Weight



Coefficients(a)

Unstandardized Coefficients Standardized Coefficients t Sig.
Model B Std. Error Beta
1 (Constant) 1.359 3.277
.415 .679
Measured Weight .990 .043 1.099 23.236 .000
Gender, 0=male, 1=female 39.964 3.929 1.447 10.171 .000
MEASF -.725 .056 -1.611 -12.957 .000
a Dependent Variable: Reported Weight

For the corrected data:

compute nmwt = measwt.
if subject=12 nmwt=57.
if subject=12 measht=166.
compute measwtf = nmwt*female.
execute.

regression 
 /dep=reptwt 
 /method=enter nmwt female measwtf. 
Variables Entered/Removed(b)
Model Variables Entered Variables Removed Method
1 MEASWTF, NMWT, Gender, 0=male, 1=female(a) . Enter
a All requested variables entered.
b Dependent Variable: Reported Weight

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .987(a) .974 .973 2.243
a Predictors: (Constant), MEASWTF, NMWT, Gender, 0=male, 1=female

ANOVA(b)
Model Sum of Squares df Mean Square F Sig.
1 Regression 33642.345 3 11214.115 2228.780 .000(a)
Residual 900.639 179 5.032

Total 34542.984 182


a Predictors: (Constant), MEASWTF, NMWT, Gender, 0=male, 1=female
b Dependent Variable: Reported Weight



Coefficients(a)

Unstandardized Coefficients Standardized Coefficients t Sig.
Model B Std. Error Beta
1 (Constant) 1.359 1.577
.861 .390
NMWT .990 .021 .962 48.279 .000
Gender, 0=male, 1=female 1.983 2.450 .072 .809 .420
MEASWTF -5.668E-02 .038 -.119 -1.474 .142
a Dependent Variable: Reported Weight

page 270 Figure 11.2 Davis's data on reported and measured weight for women (F) and men (M), showing the least-squares linear regression line for each group (the broken line for men, the solid line for women). The outlying observation has a substantial effect on the fitted line for women.

NOTE: We do not know how to get SPSS to plot two regression lines on the same graph, so the graphs have been done separately.

USE ALL.
COMPUTE filter_$=(female=1).
VARIABLE LABEL filter_$ 'female=1 (FILTER)'.
VALUE LABELS filter_$  0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.

IGRAPH
 /X1 = VAR(measwt)
 /Y = VAR(reptwt)
 /FITLINE METHOD = REGRESSION LINEAR LINE = TOTAL MEFFECT
 /SCATTER COINCIDENT = NONE.
  Interactive Graph 
  
USE ALL.
COMPUTE filter_$=(female=0).
VARIABLE LABEL filter_$ 'female=1 (FILTER)'.
VALUE LABELS filter_$  0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.

IGRAPH
 /X1 = VAR(measwt)
 /Y = VAR(reptwt)
 /FITLINE METHOD = REGRESSION LINEAR LINE = TOTAL MEFFECT
 /SCATTER COINCIDENT = NONE.
  Interactive Graph 
  
use all.

page 271 The largest hat value (middle of the page).

NOTE: Various statistics, including DFBETAs, studentized residuals and covariance ratios, are discussed in this section. Those statistics have been calculated and displayed by the regression command below.

GET FILE='D:\davis.sav'.

compute rptfem = reptwt*female.
execute.

regression 
 /dep=measwt 
 /method=enter reptwt female rptfem
 /save lev sres dfbeta covratio.
Variables Entered/Removed(b)
Model Variables Entered Variables Removed Method
1 RPTFEM, Reported Weight, Gender, 0=male, 1=female(a) . Enter
a All requested variables entered.
b Dependent Variable: Measured Weight

Model Summary(b)
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .837(a) .700 .695 8.449
a Predictors: (Constant), RPTFEM, Reported Weight, Gender, 0=male, 1=female
b Dependent Variable: Measured Weight

ANOVA(b)
Model Sum of Squares df Mean Square F Sig.
1 Regression 29786.378 3 9928.793 139.071 .000(a)
Residual 12779.436 179 71.393

Total 42565.814 182


a Predictors: (Constant), RPTFEM, Reported Weight, Gender, 0=male, 1=female
b Dependent Variable: Measured Weight



Coefficients(a)

Unstandardized Coefficients Standardized Coefficients t Sig.
Model B Std. Error Beta
1 (Constant) 1.794 5.924
.303 .762
Reported Weight .969 .076 .873 12.681 .000
Gender, 0=male, 1=female 2.074 9.297 .068 .223 .824
RPTFEM -9.525E-03 .147 -.018 -.065 .948
a Dependent Variable: Measured Weight

Casewise Diagnostics(a)
Case Number Std. Residual Measured Weight
12 12.830 166
a Dependent Variable: Measured Weight

Residuals Statistics(a)

Minimum Maximum Mean Std. Deviation N
Predicted Value 43.20 121.94 66.22 12.793 183
Std. Predicted Value -1.799 4.355 .000 1.000 183
Standard Error of Predicted Value .841 3.743 1.183 .402 183
Adjusted Predicted Value 43.49 122.66 66.24 12.811 183
Residual -7.66 108.41 .00 8.380 183
Std. Residual -.906 12.830 .000 .992 183
Stud. Residual -.935 12.895 -.001 .997 183
Deleted Residual -8.15 109.50 -.01 8.475 183
Stud. Deleted Residual -.935 48.221 .192 3.580 183
Mahal. Distance .810 34.720 2.984 3.619 183
Cook's Distance .000 .421 .003 .031 183
Centered Leverage Value .004 .191 .016 .020 183
a Dependent Variable: Measured Weight
 

NOTE: SPSS values for the leverage do not exactly match those obtained by Fox.

compute measf = measwt*female.
execute.

regression 
 /dep=reptwt 
 /method=enter measwt female measf 
 /save lev. 
Variables Entered/Removed(b)
Model Variables Entered Variables Removed Method
1 MEASF, Measured Weight, Gender, 0=male, 1=female(a) . Enter
a All requested variables entered.
b Dependent Variable: Reported Weight

Model Summary(b)
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .942(a) .887 .886 4.661
a Predictors: (Constant), MEASF, Measured Weight, Gender, 0=male, 1=female
b Dependent Variable: Reported Weight

ANOVA(b)
Model Sum of Squares df Mean Square F Sig.
1 Regression 30654.729 3 10218.243 470.408 .000(a)
Residual 3888.254 179 21.722

Total 34542.984 182


a Predictors: (Constant), MEASF, Measured Weight, Gender, 0=male, 1=female
b Dependent Variable: Reported Weight



Coefficients(a)

Unstandardized Coefficients Standardized Coefficients t Sig.
Model B Std. Error Beta
1 (Constant) 1.359 3.277
.415 .679
Measured Weight .990 .043 1.099 23.236 .000
Gender, 0=male, 1=female 39.964 3.929 1.447 10.171 .000
MEASF -.725 .056 -1.611 -12.957 .000
a Dependent Variable: Reported Weight

Casewise Diagnostics(a)
Case Number Std. Residual Reported Weight
12 -6.270 56
115 3.342 77
a Dependent Variable: Reported Weight

Residuals Statistics(a)

Minimum Maximum Mean Std. Deviation N
Predicted Value 51.64 119.15 65.62 12.978 183
Std. Predicted Value -1.078 4.124 .000 1.000 183
Standard Error of Predicted Value .464 3.939 .618 .306 183
Adjusted Predicted Value 51.99 158.24 66.01 14.583 183
Residual -29.22 15.58 .00 4.622 183
Std. Residual -6.270 3.342 .000 .992 183
Stud. Residual -11.728 3.392 -.029 1.241 183
Deleted Residual -102.24 16.04 -.39 8.643 183
Stud. Deleted Residual -24.304 3.497 -.096 2.008 183
Mahal. Distance .808 128.987 2.984 9.774 183
Cook's Distance .000 85.927 .474 6.352 183
Centered Leverage Value .004 .709 .016 .054 183
a Dependent Variable: Reported Weight

page 284 Figure 11.5 Partial-regression plots for Duncan's regression of occupational prestige on the income (a) and educational levels (b) of 45 U.S. occupations. in 1950. Three potentially influential observations (ministers, railroad conductors, and railroad engineers) are identified on the plots. The partial-regression plot for the intercept A is not shown.

NOTE: In order to get the regression line on the plot, you need to use the code below, double-click on the resulting graph, select chart from the menu at the top, select options, click on fit line total, select fit options, select regression, click on continue and OK. We do not know how to add the regression line using code.

GET FILE='D:\duncan.sav'.

regression 
 /dep=prestige 
 /method=enter income educ 
 /partialplot all 
 /sav sresid lev cook.  
Variables Entered/Removed(b)
Model Variables Entered Variables Removed Method
1 Percent of males in occupation in 1950 who were high-school graduates, Percent of males in occupation earning $3500 or more in 1950(a) . Enter
a All requested variables entered.
b Dependent Variable: Percent of raters in NORC study rating occupation as excellent or good in presti

Model Summary(b)
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .910(a) .828 .820 13.369
a Predictors: (Constant), Percent of males in occupation in 1950 who were high-school graduates, Percent of males in occupation earning $3500 or more in 1950
b Dependent Variable: Percent of raters in NORC study rating occupation as excellent or good in presti

ANOVA(b)
Model Sum of Squares df Mean Square F Sig.
1 Regression 36180.946 2 18090.473 101.216 .000(a)
Residual 7506.699 42 178.731

Total 43687.644 44


a Predictors: (Constant), Percent of males in occupation in 1950 who were high-school graduates, Percent of males in occupation earning $3500 or more in 1950
b Dependent Variable: Percent of raters in NORC study rating occupation as excellent or good in presti



Coefficients(a)

Unstandardized Coefficients Standardized Coefficients t Sig.
Model B Std. Error Beta
1 (Constant) -6.065 4.272
-1.420 .163
Percent of males in occupation earning $3500 or more in 1950 .599 .120 .464 5.003 .000
Percent of males in occupation in 1950 who were high-school graduates .546 .098 .516 5.555 .000
a Dependent Variable: Percent of raters in NORC study rating occupation as excellent or good in presti

Residuals Statistics(a)

Minimum Maximum Mean Std. Deviation N
Predicted Value 1.95 96.42 47.69 28.676 45
Std. Predicted Value -1.595 1.699 .000 1.000 45
Standard Error of Predicted Value 2.077 6.935 3.334 .903 45
Adjusted Predicted Value .81 97.04 47.59 28.701 45
Residual -29.54 34.64 .00 13.062 45
Std. Residual -2.209 2.591 .000 .977 45
Stud. Residual -2.272 2.849 .003 1.019 45
Deleted Residual -31.24 41.89 .10 14.249 45
Stud. Deleted Residual -2.397 3.135 .007 1.056 45
Mahal. Distance .084 10.862 1.956 1.928 45
Cook's Distance .000 .566 .032 .090 45
Centered Leverage Value .002 .247 .044 .044 45
a Dependent Variable: Percent of raters in NORC study rating occupation as excellent or good in presti

Prestige by income partial regression plot
Prestige by educ partial regression plot

page 285 Figure 11.6 "Bubble plot" of Cook's D, studentized residuals, and hat values, for Duncan's regression of occupational prestige on income and education. Each point is plotted as a circle with area proportional to D. Horizontal reference lines are drawn at studentized residuals of 0 and +-2; vertical reference lines are drawn at values of 2h and 3h. Several observations are identified on the plot: Ministers and conductors have large hat values and relatively large residuals; reporters have a relatively large residual, but a small hat value; railroad engineers have a large hat value, but a small residual.

NOTE: A proportional bubble plot can be made with the igraph command.  After running the command,  you will need to double-click on the graph to open the chart editor and then make several modifications.  We show below a regular scatterplot, which will suffice in many instances, and then the proportional bubble plot.

GRAPH
  /SCATTERPLOT(BIVAR)=lev_1 WITH sre_1.

Scatter of sre_1 lev_1
IGRAPH 
 /X1 = VAR(lev_1)
 /Y = VAR(sre_1)
 /SIZE = VAR(coo_1) TYPE = SCALE
 /SCALERANGE = VAR(sre_1) MIN=-2.500000 MAX=5.000000 
 /SCATTER COINCIDENT = NONE.

page 288 Table 11.1 Data on the 1907 Romanian Peasant Rebellion: I, intensity of the rebellion (corrected from the original); C, commercialization of agriculture; T, traditionalism; M, market forces; and G, inequality of land tenure.

GET FILE='D:\chirot.sav'.

list county rebel agric trad market inequal.
   COUNTY     REBEL     AGRIC      TRAD    MARKET   INEQUAL

        1     -1.39     13.80     86.20      6.20       .60
        2       .65     20.40     86.70      2.90       .72
        3      1.89     27.60     79.30     16.90       .66
        4      -.15     18.60     90.10      3.40       .74
        5      -.86     17.20     84.50      9.00       .70
        6       .11     21.50     81.50      5.20       .60
        7      -.51     11.60     82.60      5.10       .52
        8      -.86     20.40     82.40      6.30       .64
        9      -.24     19.50     87.50      4.80       .68
       10      -.77      8.90     85.60      9.50       .58
       11      -.24     25.80     82.20     10.90       .68
       12     -1.57     24.10     83.50      8.40       .74
       13      -.51     22.00     88.30      6.20       .70
       14     -1.57     24.20     84.90      6.10       .62
       15      -.51     30.60     76.10      1.30       .76
       16     -1.13     33.90     85.50      5.80       .70
       17     -1.22     28.60     84.20      2.90       .58
       18     -1.22     36.50     78.10      4.30       .72
       19      -.86     40.90     84.40      2.30       .64
       20     -1.39      6.80     76.30      3.60       .58
       21      2.81     41.90     89.70      6.60       .66
       22     -1.04     25.40     83.20      2.50       .68
       23      1.57     30.50     80.20      4.10       .76
       24      4.32     48.20     91.00      4.20       .70
       25      3.79     46.00     90.50      3.70       .68
       26      3.79     45.10     85.50      5.10       .64
       27     -1.75     12.50     83.80      7.20       .50
       28       .82     39.30     85.60      4.90       .60
       29      2.59     47.70     87.60      5.20       .58
       30      -.86     15.20     87.30     10.80       .42
       31     -1.84     11.70     82.30     81.70       .42
       32     -1.84     25.60     80.10     68.40       .26


Number of cases read:  32    Number of cases listed:  32
 

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California