UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Textbook Examples
Regression Analysis by Example by Chatterjee, Hadi and Price
Chapter 2: Simple Linear Regression

This page shows how to obtain the results from Chatterjee, Hadi and Price's Chapter 2 using SAS.

Use data in file p025a. Note the semicolon on the line following after the data indicates the end of the data.
options nocenter;

data p025a;
input y x;
datalines;
 1      -7
14      -6
25      -5
34      -4
41      -3
46      -2
49      -1
50       0
49       1
46       2
41       3
34       4
25       5
14       6
 1       7
;
run;
Table 2.3, page 25.
proc print data=p025a;
run;

Obs     y     x
  1     1    -7
  2    14    -6
  3    25    -5
  4    34    -4
  5    41    -3
  6    46    -2
  7    49    -1
  8    50     0
  9    49     1
 10    46     2
 11    41     3
 12    34     4
 13    25     5
 14    14     6
 15     1     7
Figure 2.2, page 25.

Note: The symbol statement before proc gplot sets the plotting symbol for the scatter plot to a circle.
symbol1 v=circle;
proc gplot data=p025a;
  plot y*x;
run;

Use data in file p025b.
data p025b;
input y1 x1 y2 x2 y3 x3 y4 x4;
datalines;
8.04    10      9.14    10      7.46    10      6.58    8
6.95    8       8.14    8       6.77    8       5.76    8
7.58    13      8.74    13      12.74   13      7.71    8
8.81    9       8.77    9       7.11    9       8.84    8
8.33    11      9.26    11      7.81    11      8.47    8
9.96    14      8.1     14      8.84    14      7.04    8
7.24    6       6.13    6       6.08    6       5.25    8
4.26    4       3.1     4       5.39    4       12.5    19
10.84   12      9.13    12      8.15    12      5.56    8
4.82    7       7.26    7       6.42    7       7.91    8
5.68    5       4.74    5       5.73    5       6.89    8
;
run;
Part of Table 2.4, page 25.
proc print data=p025b; 
run;

Obs      y1     x1     y2     x2      y3     x3      y4     x4
  1     8.04    10    9.14    10     7.46    10     6.58     8
  2     6.95     8    8.14     8     6.77     8     5.76     8
  3     7.58    13    8.74    13    12.74    13     7.71     8
  4     8.81     9    8.77     9     7.11     9     8.84     8
  5     8.33    11    9.26    11     7.81    11     8.47     8
  6     9.96    14    8.10    14     8.84    14     7.04     8
  7     7.24     6    6.13     6     6.08     6     5.25     8
  8     4.26     4    3.10     4     5.39     4    12.50    19
  9    10.84    12    9.13    12     8.15    12     5.56     8
 10     4.82     7    7.26     7     6.42     7     7.91     8
 11     5.68     5    4.74     5     5.73     5     6.89     8
Fig. 2.3(a), page 26.

Note: The i=r in the symbol statement includes the regression line in the scatter plot.
symbol1 v=circle i=r;

proc gplot data=p025b;
  plot y1*x1;
  plot y2*x2;
  plot y3*x3
  plot y4*x4;
run;




Use data in file p027.
data p027;
input y x;
datalines;
23      1 
29      2 
49      3 
64      4 
74      4 
87      5 
96      6 
97      6  
109     7  
119     8  
149     9  
145     9  
154     10  
166     10
;
run;
Commands to create Table 2.6,page 28.

Note: New variables are created in the datastep. The set statement starts the data step with the observations in the SAS dataset p027.
proc means data=p027; 
run;

The MEANS Procedure

Variable     N            Mean         Std Dev         Minimum         Maximum
------------------------------------------------------------------------------
y           14      97.2142857      46.2171772      23.0000000     166.0000000
x           14       6.0000000       2.9612887       1.0000000      10.0000000
------------------------------------------------------------------------------

data p027a;
  set p027;
  dy = y - 97.21;
  dx = x - 6;
  dy2 = dy**2;
  dx2 = dx**2;
  dxy = dx*dy;
run;
Table 2.6, page 28.
proc print data=p027a; 
run;

Obs     y      x        dy    dx        dy2    dx2       dxy
  1     23     1    -74.21    -5    5507.12     25    371.05
  2     29     2    -68.21    -4    4652.60     16    272.84
  3     49     3    -48.21    -3    2324.20      9    144.63
  4     64     4    -33.21    -2    1102.90      4     66.42
  5     74     4    -23.21    -2     538.70      4     46.42
  6     87     5    -10.21    -1     104.24      1     10.21
  7     96     6     -1.21     0       1.46      0      0.00
  8     97     6     -0.21     0       0.04      0      0.00
  9    109     7     11.79     1     139.00      1     11.79
 10    119     8     21.79     2     474.80      4     43.58
 11    149     9     51.79     3    2682.20      9    155.37
 12    145     9     47.79     3    2283.88      9    143.37
 13    154    10     56.79     4    3225.10     16    227.16
 14    166    10     68.79     4    4732.06     16    275.16
Fig 2.4, page 28.

Note: The i=none turns off the regression line option.
symbol1 v=circle i=none;

proc gplot data=p027;
  plot y*x;
run;

Table 2.9, page 36.

Note: In this example, the output option adds the predicted values, residuals and two standard errors to the original observations in a new SAS dataset named p027b.
proc reg data=p027;
  model y = x;
  output out=p027b predicted=yhat residual=e stdi=seyhat stdp=semu;
run;

The REG Procedure
Model: MODEL1
Dependent Variable: y

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     1          27420          27420     943.20    <.0001
Error                    12      348.84837       29.07070
Corrected Total          13          27768


Root MSE              5.39172    R-Square     0.9874
Dependent Mean       97.21429    Adj R-Sq     0.9864
Coeff Var             5.54623

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1        4.16165        3.35510       1.24      0.2385
x             1       15.50877        0.50498      30.71      <.0001
Table 2.7, page 32.
proc print data=p027b;
  var yhat e;
run;

Obs      yhat         e
  1     19.670     3.32957
  2     35.179    -6.17920
  3     50.688    -1.68797
  4     66.197    -2.19674
  5     66.197     7.80326
  6     81.706     5.29449
  7     97.214    -1.21429
  8     97.214    -0.21429
  9    112.723    -3.72306
 10    128.232    -9.23183
 11    143.741     5.25940
 12    143.741     1.25940
 13    159.249    -5.24937
 14    159.249     6.75063
Fig. 2.5, page 32.
symbol1 v=circle i=r;

proc gplot data=p027;
  plot y*x;
run;

Standard error for a predicted score, page 39.
proc print data=p027b;
  var seyhat;
run;

Obs     seyhat
  1    6.12555
  2    5.93526
  3    5.78293
  4    5.67161
  5    5.67161
  6    5.60376
  7    5.58097
  8    5.58097
  9    5.60376
 10    5.67161
 11    5.78293
 12    5.78293
 13    5.93526
 14    5.93526
Standard error for mean prediction, page 39.
proc print data=p027b;
  var y x semu;
run;

Obs     y      x      semu
  1     23     1    2.90717
  2     29     2    2.48124
  3     49     3    2.09082
  4     64     4    1.75969
  5     74     4    1.75969
  6     87     5    1.52692
  7     96     6    1.44100
  8     97     6    1.44100
  9    109     7    1.52692
 10    119     8    1.75969
 11    149     9    2.09082
 12    145     9    2.09082
 13    154    10    2.48124
 14    166    10    2.48124
Correlations, page 43.
proc corr data=p027b;
  var y x yhat;
run;

The CORR Procedure

          Pearson Correlation Coefficients, N = 14
                  Prob > |r| under H0: Rho=0

                                 y             x          yhat
y                          1.00000       0.99370       0.99370
                                          <.0001        <.0001

x                          0.99370       1.00000       1.00000
                            <.0001                      <.0001

yhat                       0.99370       1.00000       1.00000
Predicted Value of y        <.0001        <.0001

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.