UCLA Academic Technology Services HomeServicesClassesContactJobs
Search

SAS Learning Module
Collapsing across observations using proc sql

1. Creating a new variable of grand mean

Let's say that we have a data set containing three families with kids and we want to create a new variable in the data set that is the grand mean of age across the entire data set. This can be accomplished by using SAS proc sql as shown below. We also print out the new data set with a new variable of grand mean using proc print.

data kids;
  length kidname $ 4 sex $ 1;
  input famid kidname birth age wt sex ;
cards;
1 Beth 1  9  60  f
1 Bob  2  6  40  m
1 Barb 3  3  20  f
2 Andy 1  8  80  m
2 Al   2  6  50  m
2 Ann  3  2  20  f
3 Pete 1  6  60  m
3 Pam  2  4  40  f
3 Phil 3  2  20  m
;
run;
proc sql; 
  create table kids1 as
  select *,  mean(age) as mean_age 
  from kids;
quit;

proc print data=kids1 noobs;
run;

 kidname    sex    famid    birth    age    wt    mean_age

 Beth       f       1        1       9     60     5.11111
 Bob        m       1        2       6     40     5.11111
 Barb       f       1        3       3     20     5.11111
 Andy       m       2        1       8     80     5.11111
 Al         m       2        2       6     50     5.11111
 Ann        f       2        3       2     20     5.11111
 Pete       m       3        1       6     60     5.11111
 Pam        f       3        2       4     40     5.11111
 Phil       m       3        3       2     20     5.11111

2. Creating a new variable of group mean

We will continue to use the data set in previous example. Now we want to use the variable famid as a group variable and create a new variable that is the group mean of the variable age.

proc sql; 
  create table kids2 as
  select *, mean(age)  label="group average" as mean_age 
  from kids
  group by famid;
quit;

title 'New Variable of Group Mean';
proc print data=kids2 noobs;
run;

title 'Label at Work';
proc freq data=kids2;
  table mean_age;
run;

Now we see that in the following output of proc print the new variable of group mean we just created. We also see the label created for the variable in the output of proc freq.

New Variable of Group Mean           
 
kidname    sex    famid    birth    age    wt    mean_age

Barb       f       1        3       3     20     6.00000
Bob        m       1        2       6     40     6.00000
Beth       f       1        1       9     60     6.00000
Ann        f       2        3       2     20     5.33333
Al         m       2        2       6     50     5.33333
Andy       m       2        1       8     80     5.33333
Pete       m       3        1       6     60     4.00000
Phil       m       3        3       2     20     4.00000
Pam        f       3        2       4     40     4.00000
 
Label at Work                                               
                        The FREQ Procedure

                          group average

                                         Cumulative    Cumulative
    mean_age    Frequency     Percent     Frequency      Percent
-------------------------------------------------------------------
           4           3       33.33             3        33.33
5.3333333333           3       33.33             6        66.67
           6           3       33.33             9       100.00

3. Creating multiple variables of summary statistics at once

Sometimes we only need summary statistics based on a group variable similar to the output of  proc means. This can also be done in proc sql as shown in our next example.

proc sql;
  create table kids3 as
  select famid, mean(age) as mean_age , std(age) as std_age, 
         mean(wt) as mean_wt, std(wt) as std_wt from kids
  group by famid;
quit;
proc print data=kids3 noobs;
run;

famid    mean_age    std_age    mean_wt    std_wt

  1       6.00000    3.00000       40        20
  2       5.33333    3.05505       50        30
  3       4.00000    2.00000       40        20
If you only want the output statistics instead of creating a new data set, you can omit the create table statement and simply run the proc sql part. The result will be shown in the output window.
proc sql;
  select famid, mean(age) as mean_age, std(age) as std_age, 
         mean(wt) as mean_wt, std(wt) as std_wt from kids
  group by famid;
quit;
From the Output Window:
famid  mean_age   std_age   mean_wt    std_wt
------------------------------------------------
    1         6         3        40        20
    2  5.333333   3.05505        50        30
    3         4         2        40        20

4. Creating multiple summary statistics variables in the original data set

proc sql;
  create table fam5 as
  select *, mean(age) as mean_age, std(age) as std_age, 
         mean(wt) as mean_wt, std(wt) as std_wt from kids
  group by famid
  order by famid, kidname desc;
quit;
proc print data=fam5;
run;

From the Output Window:
Obs kidname sex famid birth age wt mean_age std_age mean_wt std_wt

 1   Bob     m    1     2    6  40  6.00000 3.00000    40     20
 2   Beth    f    1     1    9  60  6.00000 3.00000    40     20
 3   Barb    f    1     3    3  20  6.00000 3.00000    40     20
 4   Ann     f    2     3    2  20  5.33333 3.05505    50     30
 5   Andy    m    2     1    8  80  5.33333 3.05505    50     30
 6   Al      m    2     2    6  50  5.33333 3.05505    50     30
 7   Phil    m    3     3    2  20  4.00000 2.00000    40     20
 8   Pete    m    3     1    6  60  4.00000 2.00000    40     20
 9   Pam     f    3     2    4  40  4.00000 2.00000    40     20

5. Creating variables and their summary statistics on-the-fly

Let's say that we want to know the number of boys and girls in each family. We can use variable sex to figure it out in one step using proc sql as shown below.

proc sql;
  create table my_count as
  select famid, sum(boy) as num_boy, sum(girl) as num_girl from
  (select famid, (sex='m') as boy, (sex='f') as girl from kids)
  group by famid;
quit;
proc print data=my_count noobs;
run;

From the Output Window
famid    num_boy    num_girl

  1         1           2
  2         2           1
  3         2           1

6. Creating grand mean and save it into a SAS macro variable

Sometimes, we want to get a summary statistic for a variable and use it later for other purposes. We can save the summary statistic in a macro variable and then it can be accessed throughout the entire SAS session. proc sql is very handy as shown in the following example where we save the grand mean of variable age into macro variable meanage.

proc sql noprint; 
   select mean(age) into :meanage from kids;
quit;
%put &meanage;

From Log Window:

3027  proc sql noprint;
3028     select mean(age) into :meanage from kids;
3029  quit;
NOTE: PROCEDURE SQL used:
      real time           0.00 seconds
      cpu time            0.00 seconds

3030  %put &meanage;
5.111111

7. Creating group means and save them into a sequence of SAS macro variables

proc sql noprint;  
   select mean(age) into :meanage1 - :meanage3 from kids
   group by famid;
quit;
%put _user_; 

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.