### SPSS Annotated Output Regression Analysis

This page shows an example regression analysis with footnotes explaining the output.  These data (hsb2) were collected on 200 high schools students and are scores on various tests, including science, math, reading and social studies (socst).  The variable female is a dichotomous variable coded 1 if the student was female and 0 if male.

In the syntax below, the get file command is used to load the data into SPSS.  In quotes, you need to specify where the data file is located on your computer.  In the regression command, the statistics subcommand must come before the dependent subcommand.  You list the independent variables after the equals sign on the method subcommand.  The statistics subcommand is not needed to run the regression, but on it we can specify options that we would like to have included in the output.

Please note that SPSS sometimes includes footnotes as part of the output.  We have left those intact and have started ours with the next letter of the alphabet.

get file "c:\hsb2.sav".

regression
/statistics coeff outs r anova ci
/dependent science
/method = enter math female socst read.

#### Variables in the model

c.  Model - SPSS allows you to specify multiple models in a single regression command.  This tells you the number of the model being reported.

d.  Variables Entered - SPSS allows you to enter variables into a regression in blocks, and it allows stepwise regression.  Hence, you need to know which variables were entered into the current regression.  If you did not block your independent variables or use stepwise regression, this column should list all of the independent variables that you specified.

e.  Variables Removed - This column listed the variables that were removed from the current regression.  Usually, this column will be empty unless you did a stepwise regression.

f.  Method - This column tells you the method that SPSS used to run the regression.  "Enter" means that each independent variable was entered in usual fashion.  If you did a stepwise regression, the entry in this column would tell you that.

#### Overall Model Fit

b.  Model - SPSS allows you to specify multiple models in a single regression command.  This tells you the number of the model being reported.

c.  R - R is the square root of R-Squared and is the correlation between the observed and predicted values of dependent variable.

d. R-Square - This is the proportion of variance in the dependent variable (science) which can be explained by the independent variables (math, female, socst and read).  This is an overall measure of the strength of association and does not reflect the extent to which any particular independent variable is associated with the dependent variable.

e. Adjusted R-square - This is an adjustment of the R-squared that penalizes the addition of extraneous predictors to the model.  Adjusted R-squared is computed using the formula 1 - ((1 - Rsq)((N - 1) /( N - k - 1)) where k is the number of predictors.

f. Std. Error of the Estimate - This is also referred to as the root mean squared error.  It is the standard deviation of the error term and the square root of the Mean Square for the Residuals in the ANOVA table (see below).

#### Anova Table

c.  Model - SPSS allows you to specify multiple models in a single regression command.  This tells you the number of the model being reported.

d. Regression, Residual, Total - Looking at the breakdown of variance in the outcome variable, these are the categories we will examine: Regression, Residual, and Total. The Total variance is partitioned into the variance which can be explained by the independent variables (Model) and the variance which is not explained by the independent variables (Error).

e. Sum of Squares - These are the Sum of Squares associated with the three sources of variance, Total, Model and Residual. The Total variance is partitioned into the variance which can be explained by the independent variables (Regression) and the variance which is not explained by the independent variables (Residual).

f. df - These are the degrees of freedom associated with the sources of variance.  The total variance has N-1 degrees of freedom.  The Regression degrees of freedom corresponds to the number of coefficients estimated minus 1.  Including the intercept, there are 5 coefficients, so the model has 5-1=4 degrees of freedom.  The Error degrees of freedom is the DF total minus the DF model, 199 - 4 =195.

g. Mean Square - These are the Mean Squares, the Sum of Squares divided by their respective DF.

h. F and Sig. - This is the F-statistic the p-value associated with it.  The F-statistic is the Mean Square (Regression) divided by the Mean Square (Residual): 2385.93/51.096 = 46.695. The p-value is compared to some alpha level in testing the null hypothesis that all of the model coefficients are 0.

#### Parameter Estimates

b.  Model - SPSS allows you to specify multiple models in a single regression command.  This tells you the number of the model being reported.

c. This column shows the predictor variables (constant, math, female, socst, read).  The first variable (constant) represents the constant, also referred to in textbooks as the Y intercept, the height of the regression line when it crosses the Y axis.  In other words, this is the predicted value of science when all other variables are 0.

d. B - These are the values for the regression equation for predicting the dependent variable from the independent variable. The regression equation is presented in many different ways, for example:

Ypredicted = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4

The column of estimates provides the values for b0, b1, b2, b3 and b4 for this equation.

math - The coefficient for math is .389.  So for every unit increase in math, a 0.39 unit increase in science is predicted, holding all other variables constant.
female - For every unit increase in female, we expect a -2.010 unit decrease in the science score, holding all other variables constant.  Because female is coded 0/1 (0=male, 1=female), the interpretation is easy: for females, the predicted science score would be 2 points lower than for males.
socst - The coefficient for socst is .050.  So for every unit increase in socst, we expect an approximately .05 point increase in the science score, holding all other variables constant.
read - The coefficient for read is .335.  So for every unit increase in read, we expect a .34 point increase in the science score.

e. Std. Error - These are the standard errors associated with the coefficients.

f. Beta - These are the standardized coefficients.  These are the coefficients that you would obtain if you standardized all of the variables in the regression, including the dependent and all of the independent variables, and ran the regression.  By standardizing the variables before running the regression, you have put all of the variables on the same scale, and you can compare the magnitude of the coefficients to see which one has more of an effect.  You will also notice that the larger betas are associated with the larger t-values and lower p-values.

g. t and Sig. - These are the t-statistics and their associated 2-tailed p-values used in testing whether a given coefficient is significantly different from zero. Using an alpha of 0.05:
The coefficient for math (0.389) is significantly different from 0 because its p-value is 0.000, which is smaller than 0.05.
The coefficient for female (-2.010) is not significantly different from 0 because its p-value is 0.051, which is larger than 0.05.
The coefficient for socst (0.0498443) is not statistically significantly different from 0 because its p-value is definitely larger than 0.05.
The coefficient for read (0.3352998) is statistically significant because its p-value of 0.000 is less than .05.
The intercept is significantly different from 0 at the 0.05 alpha level.

h. 95% Confidence Limit for B Lower Bound and Upper Bound - These are the 95% confidence intervals for the coefficients.  The confidence intervals are related to the p-values such that the coefficient will not be statistically significant if the confidence interval includes 0.  These confidence intervals can help you to put the estimate from the coefficient into perspective by seeing how much the value could vary.

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.