UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

FAQ: Why are R2 and F so large for models without a constant?

When I run my OLS regression model with a constant I get an R2 of about 0.35 and an F-ratio around 100. When I run the same model without a constant the R2 is 0.97 and the F-ratio is over 7,000. Why are R2 and F-ratio so large for models without a constant?

Let's begin by going over what it means to run an OLS regression without a constant (intercept). A regression without a constant implies that the regression line should run through the origin, i.e., the point where both the response variable and predictor variable equal zero. Let's look at a scatterplot that has both the regular regression line (dashed line) and a line without the constant (solid line).

As you can see, the "true" regression line is different from noconstant line. Then how can it be that the noconstant model has a larger R2 and F-ratio then a model with a constant?

To answer this question, let's start with a review how the R2 and F-ratio for OLS regression models are computed.

R2 = SSmodel/SStotal

F = (SSmodel/dfmodel)/(SSresidual/dfresidual)

Both of these quantities depend upon the value of SSmodel. Even the sum of squares residual is defined as SSresidual = SStotal - SSmodel.

Next, let's see how each of these sums of squares are defined. For these equations we will use Yhat for the predicted value of the response variable Y and Ybar for the mean value of Y.

SStotal = Σ(Yi - Ybar)2

SSmodel = Σ(Yhati - Ybar)2

SSresidual = Σ(Yi - Yhati)2

When you run the regression without a constant in the model, you are declaring that the mean of Y equal zero (Ybar = 0). If we substitute zero for Ybar we get,

SStotal = Σ(Yi - 0)2 = Σ(Yi)2

SSmodel = Σ(Yhati - 0)2 = Σ(Yhati)2

SSresidual = Σ(Yi - Yhati)2

Unless the mean of Y is exactly zero, or very close to zero, all of the sums of squares for the model, the residual and the total will be much larger without the constant in the model than with it included. It is clear that this is the case for the sums of squares for the model and the total, but the formula for the SSresidual looks unchanged. The reason the SSresidual is larger is that the predicted values, Yhat, are not the values that fall on the "true" least squares regression line and thus, the sums of squares of these pseudo-residuals is not a minimum.

Now, when we compute R2 and the F-ratio we get,

R2 = Σ(Yhat)2/Σ(Y)2

F = (Σ(Yhat)2/dfmodel)/(Σ(Y - Yhat)2/dfresidual)

Both of these values will be larger than values computed from models that include a constant and cannot be interpreted in a meaningful way.

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.