This page was adapted from a page titled PROC REG Summary created by Professor Michael Friendly of York University . We thank Professor Friendly for permission to adapt and distribute this page via our web site.
The REG procedure fits least-squares estimates to linear regression models. The following statements are used with the REG procedure:
PROC REG options; MODEL dependents=regressors / options; VAR variables; FREQ variable; WEIGHT variable; ID variable; OUTPUT OUT=SASdataset keyword=names...; PLOT yvariable*xvariable = symbol ...; RESTRICT linear_equation,...; TEST linear_equation,...; MTEST linear_equation,...; BY variables;
The PROC REG statement is always accompanied by one or more MODEL statements to specify regression models. One OUTPUT statement may follow each MODEL statement. Several RESTRICT, TEST, and MTEST statements may follow each MODEL. WEIGHT, FREQ, and ID statements are optionally specified once for the entire PROC step. The purposes of the statements are:
PROC REG options;
These options may be specified on the PROC REG statement:
label: MODEL dependents = regressors / options;
After the keyword MODEL, the dependent (response) variables are specified, followed by an equal sign and the regressor variables. Variables specified in the MODEL statement must be variables in the data set being analyzed. The label is optional.
FREQ variable;
If a variable in your data set represents the frequency of occurrence for the other values in the observation, include the variable's name in a FREQ statement. The procedure then treats the data set as if each observation appears n times, where n is the value of the FREQ variable for the observation. The total number of observations will be considered equal to the sum of the FREQ variable when the procedure determines degrees of freedom for significance probabilities.
WEIGHT variable;
A WEIGHT statement names a variable on the input data set whose values are relative weights for a weighted least-squares fit. If the weight value is proportional to the reciprocal of the variance for each observation, then the weighted estimates are the best linear unbiased estimates (BLUE).
ID variable;
The ID statement specifies one variable to identify observations as output from the MODEL options P, R, CLM, CLI, and INFLUENCE.
The OUTPUT statement specifies an output data set to contain statistics calculated for each observation. For each statistic, specify the keyword, an equal sign, and a variable name for the statistic on the output data set. If the MODEL has several dependent variables, then a list of output variable names can be specified after each keyword to correspond to the list of dependent variables.
OUTPUT OUT=SASdataset
PREDICTED=names or P=names
RESIDUAL=names or R=names
L95M=names
U95M=names
L95=names
U95=names
STDP=names
STDR=names
STUDENT=names
COOKD=names
H=names
PRESS=names
RSTUDENT=names
DFFITS=names
COVRATIO=names;
The output data set named with OUT= contains all the variables for which the analysis was performed, including any BY variables, any ID variables, and variables named in the OUTPUT statement that contain statistics.
These statistics may be output to the new data set:
PLOT yvariable*xvariable=symbol / options
The PLOT statement prints scatter plots of the yvariables on the vertical axis and xvariables on the horizontal axis. It uses the symbol specified to mark the points. The yvariables and xvariables may be any variables in the data set or any of the calculated statistics available in the OUTPUT statement.
label: TEST equation1,
equation2,
.
.
.
equationk;
label: TEST equation1,..., equationk / options;
The TEST statement, which has the same syntax as the RESTRICT statement except for options, tests hypotheses about the parameters estimated in the preceding MODEL statement. Each equation specifies a linear hypothesis to be tested.
One option may be specified in the TEST statement after a slash (/):
BY Statement
BY variables;
A BY statement may be used with PROC REG to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. If your input data set is not sorted in ascending order, use the SORT procedure with a similar BY statement to sort the data, or, if appropriate, use the BY statement options NOTSORTED or DESCENDING.
This page was adapted from a page titled PROC REG Summary created by Professor Michael Friendly of York University . We thank Professor Friendly for permission to adapt and distribute this page via our web site.
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.