SAS Library
Overview of SAS PROC REG


This page was adapted from a page titled PROC REG Summary created by Professor Michael Friendly of York University .  We thank Professor Friendly for permission to adapt and distribute this page via our web site.


The REG procedure fits least-squares estimates to linear regression models. The following statements are used with the REG procedure:

     PROC REG options;
        MODEL dependents=regressors / options;
        VAR variables;
        FREQ variable;
        WEIGHT variable;
        ID variable;
        OUTPUT OUT=SASdataset keyword=names...;
        PLOT yvariable*xvariable = symbol ...;
        RESTRICT linear_equation,...;
        TEST linear_equation,...;
        MTEST linear_equation,...;
        BY variables;

The PROC REG statement is always accompanied by one or more MODEL statements to specify regression models. One OUTPUT statement may follow each MODEL statement. Several RESTRICT, TEST, and MTEST statements may follow each MODEL. WEIGHT, FREQ, and ID statements are optionally specified once for the entire PROC step. The purposes of the statements are:

Proc REG Statement

   PROC REG options;

These options may be specified on the PROC REG statement:

DATA=SASdataset
names the SAS data set to be used by PROC REG. If DATA= is not specified, REG uses the most recently created SAS data set.
OUTEST=SASdataset
requests that parameter estimates be output to this data set.
OUTSSCP=SASdataset
requests that the crossproducts matrix be output to this TYPE=SSCP data set.
NOPRINT
suppresses the normal printed output.
SIMPLE
prints the "simple" descriptive statistics for each variable used in REG.
ALL
requests many different printouts.
COVOUT
outputs the covariance matrices for the parameter estimates to the OUTEST data set. This option is valid only if OUTEST= is also specified.

MODEL Statement

   label: MODEL dependents = regressors / options;

After the keyword MODEL, the dependent (response) variables are specified, followed by an equal sign and the regressor variables. Variables specified in the MODEL statement must be variables in the data set being analyzed. The label is optional.

FREQ Statement

   FREQ variable;

If a variable in your data set represents the frequency of occurrence for the other values in the observation, include the variable's name in a FREQ statement. The procedure then treats the data set as if each observation appears n times, where n is the value of the FREQ variable for the observation. The total number of observations will be considered equal to the sum of the FREQ variable when the procedure determines degrees of freedom for significance probabilities.

WEIGHT Statement

   WEIGHT variable;

A WEIGHT statement names a variable on the input data set whose values are relative weights for a weighted least-squares fit. If the weight value is proportional to the reciprocal of the variance for each observation, then the weighted estimates are the best linear unbiased estimates (BLUE).

ID Statement

   ID variable;

The ID statement specifies one variable to identify observations as output from the MODEL options P, R, CLM, CLI, and INFLUENCE.

OUTPUT Statement

The OUTPUT statement specifies an output data set to contain statistics calculated for each observation. For each statistic, specify the keyword, an equal sign, and a variable name for the statistic on the output data set. If the MODEL has several dependent variables, then a list of output variable names can be specified after each keyword to correspond to the list of dependent variables.

  OUTPUT OUT=SASdataset
         PREDICTED=names or P=names
         RESIDUAL=names or R=names
         L95M=names
         U95M=names
         L95=names
         U95=names
         STDP=names
         STDR=names
         STUDENT=names
         COOKD=names
         H=names
         PRESS=names
         RSTUDENT=names
         DFFITS=names
         COVRATIO=names;

The output data set named with OUT= contains all the variables for which the analysis was performed, including any BY variables, any ID variables, and variables named in the OUTPUT statement that contain statistics.

These statistics may be output to the new data set:

PREDICTED=
P=
predicted values.
RESIDUAL=
R=
residuals, calculated as ACTUAL minus PREDICTED.
L95M=
lower bound of a 95% confidence interval for the expected value (mean) of the dependent variable.
U95M=
upper bound of a 95% confidence interval for the expected value (mean) of the dependent variable.
L95=
lower bound of a 95% confidence interval for an individual prediction. This includes the variance of the error as well as the variance of the parameter estimates.
U95=
upper bound of a 95% confidence interval for an individual prediction.
STDP=
standard error of the mean predicted value.
STDR=
standard error of the residual.
STUDENT=
studentized residuals, the residual divided by its standard error.
COOKD=
Cook's D influence statistic.
H=
leverage.
PRESS=
residual for estimates dropping this observation, which is the residual divided by (1-h) where h is leverage above.
RSTUDENT=
studentized residual defined slightly differently than above.
DFFITS=
standard influence of observation on predicted value.
COVRATIO=
standard influence of observation on covariance of betas, as discussed with INFLUENCE option.

PLOT Statement

     PLOT yvariable*xvariable=symbol / options

The PLOT statement prints scatter plots of the yvariables on the vertical axis and xvariables on the horizontal axis. It uses the symbol specified to mark the points. The yvariables and xvariables may be any variables in the data set or any of the calculated statistics available in the OUTPUT statement.

TEST Statement

     label:  TEST equation1,
                  equation2,
                     .
                     .
                     .
                  equationk;
     label:  TEST equation1,..., equationk / options;

The TEST statement, which has the same syntax as the RESTRICT statement except for options, tests hypotheses about the parameters estimated in the preceding MODEL statement. Each equation specifies a linear hypothesis to be tested.

One option may be specified in the TEST statement after a slash (/):

PRINT
prints intermediate calculations.

BY Statement

   BY variables;

A BY statement may be used with PROC REG to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. If your input data set is not sorted in ascending order, use the SORT procedure with a similar BY statement to sort the data, or, if appropriate, use the BY statement options NOTSORTED or DESCENDING.


This page was adapted from a page titled PROC REG Summary created by Professor Michael Friendly of York University .  We thank Professor Friendly for permission to adapt and distribute this page via our web site.

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.