UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

SAS Textbook Examples
Regression Analysis by Example by Chatterjee, Hadi and Price
Chapter 5: Qualitative Variables as Predictors 

Inputting Salary Survey data, table 5.1, p. 124.
data p124;
  input S X E M;
cards;
13876 1 1 1 
11608 1 3 0 
18701 1 3 1 
11283 1 2 0 
11767 1 3 0 
20872 2 2 1 
11772 2 2 0 
10535 2 1 0 
12195 2 3 0 
12313 3 2 0 
14975 3 1 1 
21371 3 2 1 
19800 3 3 1 
11417 4 1 0 
20263 4 3 1 
13231 4 3 0 
12884 4 2 0 
13245 5 2 0 
13677 5 3 0 
15965 5 1 1 
12336 6 1 0 
21352 6 3 1 
13839 6 2 0 
22884 6 2 1 
16978 7 1 1 
14803 8 2 0 
17404 8 1 1 
22184 8 3 1 
13548 8 1 0 
14467 10 1 0 
15942 10 2 0 
23174 10 3 1 
23780 10 2 1 
25410 11 2 1 
14861 11 1 0 
16882 12 2 0 
24170 12 3 1 
15990 13 1 0 
26330 13 2 1 
17949 14 2 0 
25685 15 3 1 
27837 16 2 1 
18838 16 2 0 
17483 16 1 0 
19207 17 2 0 
19346 20 1 0 
;
run;
Creating the dummy coding for the variable e.
data p124;
  set p124;
  e1 = .;
  if e = 1 then e1 = 1;
  else e1 = 0;
  e2 = .;
  if e = 2 then e2 = 1;
  else e2 = 0;
run;
proc freq data = p124;
  tables e e1 e2;
run;
The FREQ Procedure
                              Cumulative    Cumulative
E    Frequency     Percent     Frequency      Percent
-----------------------------------------------------
1          14       30.43            14        30.43
2          19       41.30            33        71.74
3          13       28.26            46       100.00

                               Cumulative    Cumulative
e1    Frequency     Percent     Frequency      Percent
-------------------------------------------------------
 0          32       69.57            32        69.57
 1          14       30.43            46       100.00

                               Cumulative    Cumulative
e2    Frequency     Percent     Frequency      Percent
-------------------------------------------------------
 0          27       58.70            27        58.70
 1          19       41.30            46       100.00
Creating the category variables used in table 5.2, p. 126.
data p124;
  set p124;
  category = .;
  if e = 1 and m = 0 then category = 1;
  if e = 1 and m = 1 then category = 2;
  if e = 2 and m = 0 then category = 3;
  if e = 2 and m = 1 then category = 4;
  if e = 3 and m = 0 then category = 5;
  if e = 3 and m = 1 then category = 6;
run;
Table 5.3, p. 126, fig. 5.1, p. 127 and fig. 5.2, p. 128.
proc reg data = p124;
  var category;
  model s = x e1 e2 m;  
  plot student.*x student.*category;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: S

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     4      957816858      239454214     226.84    <.0001
Error                    41       43280719        1055627
Corrected Total          45     1001097577

Root MSE           1027.43725    R-Square     0.9568
Dependent Mean          17270    Adj R-Sq     0.9525
Coeff Var             5.94919

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1          11032      383.21713      28.79      <.0001
X             1      546.18402       30.51919      17.90      <.0001
e1            1    -2996.21026      411.75271      -7.28      <.0001
e2            1      147.82495      387.65932       0.38      0.7049
M             1     6883.53101      313.91898      21.93      <.0001

Creating the interaction variables, p. 128.
data p124;
  set p124;
  me1= m*e1;
  me2 = m*e2;
run;
Table 5.4 and fig. 5.3, p. 129.
symbol v=dot h=.8 c=blue;
proc reg data = p124;
  model s = x e1 e2 m me1 me2;
  plot student.*x;
run;
quit;

The REG Procedure
Model: MODEL1
Dependent Variable: S

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     6      999919409      166653235    5516.60    <.0001
Error                    39        1178168          30209
Corrected Total          45     1001097577

Root MSE            173.80861    R-Square     0.9988
Dependent Mean          17270    Adj R-Sq     0.9986
Coeff Var             1.00641

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1          11203       79.06545     141.70      <.0001
X             1      496.98701        5.56642      89.28      <.0001
e1            1    -1730.74832      105.33389     -16.43      <.0001
e2            1     -349.07769       97.56790      -3.58      0.0009
M             1     7047.41202      102.58919      68.70      <.0001
me1           1    -3066.03512      149.33044     -20.53      <.0001
me2           1     1836.48795      131.16736      14.00      <.0001
Deleting observation 33, repeating the regression with interactions, table 5.5 and fig. 5.4-5.5, p. 129-130.
data missing33;
  set p124;
  id = _N_; /* creates the id variable */
  if id = 33 then delete;
run;
symbol1 c=blue v=dot;
proc reg data = missing33;
  var category;
  model s = x e1 e2 m me1 me2;  
  plot student.*x student.*category;
run;
quit;
The REG Procedure
Model: MODEL1
Dependent Variable: S

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     6      957607113      159601186    35428.0    <.0001
Error                    38         171188     4504.95052
Corrected Total          44      957778301

Root MSE             67.11893    R-Square     0.9998
Dependent Mean          17126    Adj R-Sq     0.9998
Coeff Var             0.39192

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1          11200       30.53338     366.80      <.0001
X             1      498.41777        2.15169     231.64      <.0001
e1            1    -1741.33595       40.68250     -42.80      <.0001
e2            1     -357.04226       37.68114      -9.48      <.0001
M             1     7040.58014       39.61907     177.71      <.0001
me1           1    -3051.76329       57.67420     -52.91      <.0001
me2           1     1997.53060       51.78498      38.57      <.0001

Table 5.6, Estimates of the Base Salary, p. 131.
proc glm data = missing33;
  class e m ;
  model s = x e e*m;
  lsmean e*m/ at  x=0 stderr cl;
run;
quit;
The GLM Procedure

   Class Level Information

Class         Levels    Values
E                  3    1 2 3
M                  2    0 1

Number of observations    45

The GLM Procedure
Dependent Variable: S

                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F
Model                        6     957607113.1     159601185.5    35428.0    <.0001
Error                       38        171188.1          4505.0
Corrected Total             44     957778301.2
R-Square     Coeff Var      Root MSE        S Mean
0.999821      0.391923      67.11893      17125.53
Source                      DF       Type I SS     Mean Square    F Value    Pr > F
X                            1     276059254.3     276059254.3    61279.1    <.0001
E                            2     153242718.2      76621359.1    17008.3    <.0001
E*M                          3     528305140.6     176101713.5    39090.7    <.0001
Source                      DF     Type III SS     Mean Square    F Value    Pr > F
X                            1     241723277.6     241723277.6    53657.3    <.0001
E                            2     119359886.9      59679943.4    13247.6    <.0001
E*M                          3     528305140.6     176101713.5    39090.7    <.0001

The GLM Procedure
Least Squares Means at X=0

                              Standard
E    M        S LSMEAN           Error    Pr > |t|
1    0       9458.3778         31.0407      <.0001
1    1      13447.1947         31.7437      <.0001
2    0      10842.6715         26.1571      <.0001
2    1      19880.7823         32.9443      <.0001
3    0      11199.7138         30.5334      <.0001
3    1      18240.2939         28.5471      <.0001

E    M        S LSMEAN      95% Confidence Limits
1    0     9458.377848     9395.539200  9521.216497
1    1           13447           13383        13511
2    0           10843           10790        10896
2    1           19881           19814        19947
3    0           11200           11138        11262
3    1           18240           18183        18298
Table 5.7, the Pre-employment Testing Program data, p. 134.
data p134;
  input TEST RACE JPERF;
cards;
0.28 1 1.83 
0.97 1 4.59 
1.25 1 2.97 
2.46 1 8.14 
2.51 1 8.00 
1.17 1 3.30 
1.78 1 7.53 
1.21 1 2.03 
1.63 1 5.00 
1.98 1 8.04 
2.36 0 3.25 
2.11 0 5.30 
0.45 0 1.39 
1.76 0 4.69 
2.09 0 6.56 
1.50 0 3.00 
1.25 0 5.85 
0.72 0 1.90 
0.42 0 3.85 
1.53 0 2.95 
;
run;
Table 5.8 and fig. 5.7, p. 135.
symbol v=dot h=.8 c=blue;
proc reg data = p134;
  model jperf = test;
  plot student.*test;
run;
quit;

The REG Procedure
Model: MODEL1
Dependent Variable: JPERF

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     1       48.72296       48.72296      19.25    0.0004
Error                    18       45.56830        2.53157
Corrected Total          19       94.29125

Root MSE              1.59109    R-Square     0.5167
Dependent Mean        4.50850    Adj R-Sq     0.4899
Coeff Var            35.29093

                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1        1.03497        0.86803       1.19      0.2486
TEST          1        2.36053        0.53807       4.39      0.0004

data temp;
  set p134;
  racetest = race*test;
run;
Table 5.9 and fig. 5.8, p. 135.
symbol v=dot h=.8 c=blue;
proc reg data = temp;
  model jperf = test race racetest;
  plot student.*test;
run;
quit;

The REG Procedure
Model: MODEL1
Dependent Variable: JPERF

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     3       62.63578       20.87859      10.55    0.0005
Error                    16       31.65547        1.97847
Corrected Total          19       94.29125

Root MSE              1.40658    R-Square     0.6643
Dependent Mean        4.50850    Adj R-Sq     0.6013
Coeff Var            31.19840
                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1        2.01028        1.05011       1.91      0.0736
TEST          1        1.31340        0.67037       1.96      0.0677
RACE          1       -1.91317        1.54032      -1.24      0.2321
racetest      1        1.99755        0.95444       2.09      0.0527
Table 5.10 and fig. 5.10-5.11, p. 136-137.
proc sort data = p134;
 by race;
run;
proc reg data = p134;
 by race;
 model jperf = test;
 plot student.*test;
run;
quit;
RACE=0
The REG Procedure
Model: MODEL1
Dependent Variable: JPERF

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     1        7.59441        7.59441       3.32    0.1059
Error                     8       18.29863        2.28733
Corrected Total           9       25.89304

Root MSE              1.51239    R-Square     0.2933
Dependent Mean        3.87400    Adj R-Sq     0.2050
Coeff Var            39.03954
                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1        2.01028        1.12911       1.78      0.1129
TEST          1        1.31340        0.72080       1.82      0.1059
RACE=1

The REG Procedure
Model: MODEL1
Dependent Variable: JPERF

                             Analysis of Variance

                                    Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F
Model                     1       46.98957       46.98957      28.14    0.0007
Error                     8       13.35684        1.66960
Corrected Total           9       60.34641

Root MSE              1.29213    R-Square     0.7787
Dependent Mean        5.14300    Adj R-Sq     0.7510
Coeff Var            25.12409
                        Parameter Estimates

                     Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|
Intercept     1        0.09712        1.03519       0.09      0.9276
TEST          1        3.31095        0.62411       5.31      0.0007

Fig. 5.9, p. 136.
proc reg data = p134 noprint;
 var race;
 model jperf = test;
 plot student.*race;
run;
quit;


How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California