|
|
|
||||
|
|
|||||
This page shows an example regression analysis with footnotes explaining the output. These data (hsb2) were collected on 200 high schools students and are scores on various tests, including science, math, reading and social studies (socst). The variable female is a dichotomous variable coded 1 if the student was female and 0 if male.
In the code below, the data = option on the proc reg statement tells SAS where to find the SAS data set to be used in the analysis. On the model statement, we specify the regression model that we want to run, with the dependent variable (in this case, science) on the left of the equals sign, and the independent variables on the right-hand side. We use the clb option after the slash on the model statement to get the 95% confidence limits of the parameter estimates. The quit statement is included because proc reg is an interactive procedure, and quit tells SAS that not to expect another proc reg immediately.
proc reg data = "d:\hsb2"; model science = math female socst read / clb; run; quit;
The REG Procedure Model: MODEL1 Dependent Variable: science science score
Analysis of Variance
Sum of Mean Source DF Squares Square F Value Pr > F
Model 4 9543.72074 2385.93019 46.69 <.0001 Error 195 9963.77926 51.09630 Corrected Total 199 19507
Root MSE 7.14817 R-Square 0.4892 Dependent Mean 51.85000 Adj R-Sq 0.4788 Coeff Var 13.78624
Parameter Estimates
Parameter Standard Variable Label DF Estimate Error t Value Pr > |t|
Intercept Intercept 1 12.32529 3.19356 3.86 0.0002 math math score 1 0.38931 0.07412 5.25 <.0001 female 1 -2.00976 1.02272 -1.97 0.0508 socst social studies score 1 0.04984 0.06223 0.80 0.4241 read reading score 1 0.33530 0.07278 4.61 <.0001
Parameter Estimates
Variable Label DF 95% Confidence Limits
Intercept Intercept 1 6.02694 18.62364 math math score 1 0.24312 0.53550 female 1 -4.02677 0.00724 socst social studies score 1 -0.07289 0.17258 read reading score 1 0.19177 0.47883
Analysis of Variance
Sum of Mean
Sourcea DFb Squaresc Squared F Valuee Pr > Ff
Model 4 9543.72074 2385.93019 46.69 <.0001
Error 195 9963.77926 51.09630
Corrected Total 199 19507
a. Source - Looking at the
breakdown of variance in the outcome variable, these are the categories we will
examine: Model, Error, and Corrected Total. The Total variance is partitioned
into the variance which can be explained by the independent variables (Model)
and the variance which is not explained by the independent variables (Error). b. DF - These are the degrees of freedom associated with the sources of variance. The total variance has N-1 degrees of freedom. The model degrees of freedom corresponds to the number of coefficients estimated minus 1. Including the intercept, there are 5 coefficients, so the model has 5-1=4 degrees of freedom. The Error degrees of freedom is the DF total minus the DF model, 199 - 4 =195.
c. Sum of Squares - These are the Sum of Squares associated with the three sources of variance, Total, Model and Error.
d. Mean Square - These are the Mean Squares, the Sum of Squares divided by their respective DF.
e. F Value - This is the F-statistic is the Mean Square Model (2385.93019) divided by the Mean Square Error (51.09630), yielding F=46.69.
f. Pr > F - This is the p-value associated with the above F-statistic. It is used in testing the null hypothesis that all of the model coefficients are 0.
Root MSEg 7.14817 R-Squarej 0.4892 Dependent Meanh 51.85000 Adj R-Sqk 0.4788 Coeff Vari 13.78624
g. Root MSE - Root MSE is the standard deviation of the error term, and is the square root of the Mean Square Error.
h. Dependent Mean - This is the mean of the dependent variable.
i. Coeff Var - This is the coefficient of variation, which is a unit-less measure of variation in the data. It is the root MSE divided by the mean of the dependent variable, multiplied by 100: (100*(7.15/51.85) = 13.79).
j. R-Square - R-Squared is the proportion of variance in the dependent variable (science) which can be explained by the independent variables (math, female, socst and read). This is an overall measure of the strength of association and does not reflect the extent to which any particular independent variable is associated with the dependent variable.
k. Adj R-Sq - This is an adjustment of the R-squared that penalizes the addition of extraneous predictors to the model. Adjusted R-squared is computed using the formula 1 - ((1 - Rsq)((N - 1) /( N - k - 1)) where k is the number of predictors.
Parameter Estimates
Parameter Standard Variablel Labelm DFn Estimateo Errorp t Valueq Pr > |t|r
Intercept Intercept 1 12.32529 3.19356 3.86 0.0002 math math score 1 0.38931 0.07412 5.25 <.0001 female 1 -2.00976 1.02272 -1.97 0.0508 socst social studies score 1 0.04984 0.06223 0.80 0.4241 read reading score 1 0.33530 0.07278 4.61 <.0001
Parameter Estimates
Variablel Labelm DFn 95% Confidence Limitss
Intercept Intercept 1 6.02694 18.62364 math math score 1 0.24312 0.53550 female 1 -4.02677 0.00724 socst social studies score 1 -0.07289 0.17258 read reading score 1 0.19177 0.47883
l. Variable - This column shows the predictor variables (constant, math, female, socst, read). The first refers the model intercept, the height of the regression line when it crosses the Y axis. In other words, this is the predicted value of science when all other variables are 0.
m. Label - This column gives the label for the variable. Usually, variable labels are added when the data set is created so that it is clear what the variable is (as the name of the variable can sometimes be ambiguous). SAS has labeled the variable Intercept for us by default. Note that this variable is not added to the data set.
n. DF - This column give the degrees of freedom associated with each independent variable. All continuous variables have one degree of freedom, as do binary variables (such as female).
o. Parameter Estimates - These are the values for the regression equation for predicting the dependent variable from the independent variable. The regression equation is presented in many different ways, for example:
Ypredicted = b0 + b1*x1 + b2*x2 + b3*x3 + b4*x4
The column of estimates provides the values for b0, b1, b2, b3 and b4 for this equation.
math - The coefficient is .3893102. So for every unit increase in
math, a 0.38931 unit increase in science is predicted, holding all
other variables constant.
female - For every unit increase in female, we expect a -2.00976
unit decrease in the science score, holding all other variables
constant. Since female is coded 0/1 (0=male, 1=female) the
interpretation is more simply: for females, the predicted science score would be
2 points lower than for males.
socst - The coefficient for socst is .0498443. So for every
unit increase in socst, we expect an approximately .05 point increase in
the science score, holding all other variables constant.
read - The coefficient for read is .3352998. So for every unit
increase in read, we expect a .34 point increase in the science score.
p. Standard Error - These are the standard errors associated with the coefficients.
q. t Value - These are the t-statistics used in testing whether a given coefficient is significantly different from zero.
r. Pr > |t|- This column shows the 2-tailed p-values used in
testing the null hypothesis that the coefficient (parameter) is 0. Using an
alpha of 0.05:
The coefficient for math is significantly different from 0 because its
p-value is 0.000, which is smaller than 0.05.
The coefficient for socst (.0498443) is not statistically
significantly different from 0 because its p-value is definitely larger than
0.05.
The coefficient for read (.3352998)
is statistically significant because
its p-value of 0.000 is less than .05.
The intercept is significantly different from 0 at the 0.05 alpha level.
s. 95% Confidence Limits - These are the 95% confidence intervals for the coefficients. The confidence intervals are related to the p-values such that the coefficient will not be statistically significant if the confidence interval includes 0. These confidence intervals can help you to put the estimate from the coefficient into perspective by seeing how much the value could vary.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services