UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Code Fragments
PROC SurveyReg Examples

/*Suppose that, in a junior high school, there are a total of 4,000 
students in
   grades 7, 8, and 9. You want to now how household income and the number of
   children in a household affect students' average weekly spending for ice
   cream.  In order to answer this question, you draw a sample using simple 
random
   sampling from the student population in the junior high school. You 
randomly
   select 40 students and ask them their average weekly expenditure for ice 
cream,
   their household income, and the number of children in their household. The
   answers from the 40 students are saved as a SAS data set. */;

    data IceCream;
       input Grade Spending Income Kids @@;
       datalines;
     7   7  39  2   7   7  38  1   8  12  47  1
     9  10  47  4   7   1  34  4   7  10  43  2
     7   3  44  4   8  20  60  3   8  19  57  4
     7   2  35  2   7   2  36  1   9  15  51  1
     8  16  53  1   7   6  37  4   7   6  41  2
     7   6  39  2   9  15  50  4   8  17  57  3
     8  14  46  2   9   8  41  2   9   8  41  1
     9   7  47  3   7   3  39  3   7  12  50  2
     7   4  43  4   9  14  46  3   8  18  58  4
     9   9  44  3   7   2  37  1   7   1  37  2
     7   4  44  2   7  11  42  2   9   8  41  2
     8  10  42  2   8  13  46  1   7   2  40  3
     9   6  45  1   9  11  45  4   7   2  36  1
     7   9  46  1
;
run;

/*In the data set IceCream, the variable Grade indicates a student's grade.
   The variable Spending contains the dollar amount of each student's average
   weekly spending for ice cream. The variable Income specifies the household
   income, in thousands of dollars. The variable Kids indicates how many 
children
   are in a student's family. */;

/* First let's try OLS regression which does not account for the sampling 
scheme
    and let's use dummy coding */;

    title1 'Ice Cream Spending Analysis';
    title2 'OLS Regression estimates';

  data reg;
   set IceCream;
    if kids=1 then do;  k1= 1;k2= 0;k3= 0; end;
    if kids=2 then do;  k1= 0;k2= 1;k3= 0; end;
    if kids=3 then do;  k1= 0;k2= 0;k3= 1; end;
    if kids=4 then do;  k1= 0;k2= 0;k3= 0; end;   /*DUMMY CODING*/;
  run;

proc reg data=reg;
    model Spending = Income k1--k3;
run;

/* Now let's try running the same data assuming SRS from a sample of 4000 */;

   title2 'Simple Random Sampling Design';

    proc surveyreg data=IceCream total=4000;
       class Kids;
       model Spending = Income Kids / solution;
    run;

/* Now let's suppose that the previous student sample is actually drawn from
    a stratified sampling (STRS). The strata are grades in the junior high 
school:
    the 7th grade, the 8th grade, and the 9th grade. Within strata, simple 
random
    samples are selected. The StudentTotal data provides the number of students
    in each grade. */;

data StudentTotal;
     input Grade _TOTAL_;
datalines;
    7 1824
    8 1025
    9 1151
;
run;

/*The variable Grade is the stratification variable, and the variable _TOTAL_
   contains the total numbers of students in the strata in the survey 
population.
   PROC SURVEYREG requires you to use the keyword _TOTAL_ as the name of 
the variable
   that contains the population total information. The following statements
   demonstrate how you can fit the linear model while incorporating the sample
   design information (stratification). */;

   title2 'Stratified Simple Random Sampling Design';

    proc surveyreg data=IceCream total=StudentTotal;
       strata Grade /list;
       class Kids;
       model Spending = Income Kids / solution;
    run;

/*By comparing these statements to those in the section "Simple Random 
Sampling",
   the TOTAL=StudentTotal option replaces the previous TOTAL=4000 option. 
When the
   population totals and sample sizes differ among strata, the population 
totals must
   be provided by a data set. The STRATA statement specifies the 
stratification
   variable Grade. The LIST option in the STRATA statement requests that the
   stratification information be included in the output. */;

---------------------------------------------

Ice Cream Spending Analysis                   11:19 Wednesday, July 19, 
2000   1
OLS Regression estimates

The REG Procedure
Model: MODEL1
Dependent Variable: Spending

                              Analysis of Variance

                                     Sum of           Mean
Source                   DF        Squares         Square    F Value    Pr > F

Model                     4      915.30965      228.82741      38.10    <.0001
Error                    35      210.19035        6.00544
Corrected Total          39     1125.50000


Root MSE              2.45060    R-Square     0.8132
Dependent Mean        8.75000    Adj R-Sq     0.7919
Coeff Var            28.00685


                         Parameter Estimates

                      Parameter       Standard
Variable     DF       Estimate          Error    t Value    Pr > |t|

Intercept     1      -26.08468        3.07591      -8.48      <.0001
Income        1        0.77533        0.06431      12.06      <.0001
k1            1        0.89765        1.11649       0.80      0.4268
k2            1        1.49403        1.10259       1.36      0.1841
k3            1       -0.51318        1.23855      -0.41      0.6812
Ice Cream Spending Analysis                   11:19 Wednesday, July 19, 
2000   2
Simple Random Sampling Design

The SURVEYREG Procedure

Regression Analysis for Dependent Variable Spending

             Data Summary

Number of Observations            40
Mean of Spending             8.75000
Sum of Spending            350.00000


       Fit Statistics

R-square            0.8132
Root MSE            2.4506
Denominator DF          39


     Class Level Information

Class
Variable      Levels    Values

Kids               4    1 2 3 4


                  ANOVA for Dependent Variable Spending

                                  Sum of        Mean
Source                   DF     Squares      Square    F Value    Pr > F

Model                     4     915.310    228.8274      38.10    <.0001
Error                    35     210.190      6.0054
Corrected Total          39    1125.500


          Tests of Model Effects

Effect       Num DF    F Value    Pr > F

Model             4     119.15    <.0001
Intercept         1     153.32    <.0001
Income            1     324.45    <.0001
Kids              3       0.92    0.4385

NOTE: The denominator degrees of freedom for the F tests is 39.


              Estimated Regression Coefficients

                              Standard
Parameter      Estimate         Error    t Value    Pr > |t|

Intercept    -26.084677    2.46720403     -10.57      <.0001
Income         0.775330    0.04304415      18.01      <.0001
Kids 1         0.897655    1.12352876       0.80      0.4292
Kids 2         1.494032    1.24705263       1.20      0.2381
Ice Cream Spending Analysis                   11:19 Wednesday, July 19, 
2000   3
Simple Random Sampling Design

The SURVEYREG Procedure

Regression Analysis for Dependent Variable Spending

              Estimated Regression Coefficients

                              Standard
Parameter      Estimate         Error    t Value    Pr > |t|

Kids 3        -0.513181    1.33454891      -0.38      0.7027
Kids 4         0.000000    0.00000000        .         .

NOTE: The denominator degrees of freedom for the t tests is 39.
       Matrix X'X is singular and a generalized inverse was used to solve the
       normal equations.  Estimates are not unique.
Ice Cream Spending Analysis                   11:19 Wednesday, July 19, 
2000   4
Stratified Simple Random Sampling Design

The SURVEYREG Procedure

Regression Analysis for Dependent Variable Spending

             Data Summary

Number of Observations            40
Mean of Spending             8.75000
Sum of Spending            350.00000


         Design Summary

Number of Strata             3


       Fit Statistics

R-square            0.8132
Root MSE            2.4506
Denominator DF          37


                   Stratum Information

Stratum                           Population    Sampling
  Index     Grade       N Obs           Total        Rate

    1         7            20            1824        0.01
    2         8             9            1025        0.01
    3         9            11            1151        0.01


     Class Level Information

Class
Variable      Levels    Values

Kids               4    1 2 3 4


                  ANOVA for Dependent Variable Spending

                                  Sum of        Mean
Source                   DF     Squares      Square    F Value    Pr > F

Model                     4     915.310    228.8274      38.10    <.0001
Error                    35     210.190      6.0054
Corrected Total          39    1125.500


          Tests of Model Effects

Effect       Num DF    F Value    Pr > F

Model             4     114.60    <.0001
Intercept         1     150.05    <.0001
Ice Cream Spending Analysis                   11:19 Wednesday, July 19, 
2000   5
Stratified Simple Random Sampling Design

The SURVEYREG Procedure

Regression Analysis for Dependent Variable Spending

          Tests of Model Effects

Effect       Num DF    F Value    Pr > F

Income            1     317.63    <.0001
Kids              3       0.93    0.4355

NOTE: The denominator degrees of freedom for the F tests is 37.


              Estimated Regression Coefficients

                              Standard
Parameter      Estimate         Error    t Value    Pr > |t|

Intercept    -26.084677    2.48241893     -10.51      <.0001
Income         0.775330    0.04350401      17.82      <.0001
Kids 1         0.897655    1.11778377       0.80      0.4271
Kids 2         1.494032    1.25209199       1.19      0.2404
Kids 3        -0.513181    1.36853454      -0.37      0.7098
Kids 4         0.000000    0.00000000        .         .

NOTE: The denominator degrees of freedom for the t tests is 37.
       Matrix X'X is singular and a generalized inverse was used to solve the
       normal equations.  Estimates are not unique.

 


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.