|
|
|
||||
|
|
|||||
This module illustrates how to obtain basic descriptive statistics using SAS. We illustrate this using a data file about 26 automobiles with their make, price, mpg, repair record, and whether the car was foreign or domestic. The data file is illustrated below.
MAKE PRICE MPG REP78 FOREIGN AMC 4099 22 3 0 AMC 4749 17 3 0 AMC 3799 22 3 0 Audi 9690 17 5 1 Audi 6295 23 3 1 BMW 9735 25 4 1 Buick 4816 20 3 0 Buick 7827 15 4 0 Buick 5788 18 3 0 Buick 4453 26 3 0 Buick 5189 20 3 0 Buick 10372 16 3 0 Buick 4082 19 3 0 Cad. 11385 14 3 0 Cad. 14500 14 2 0 Cad. 15906 21 3 0 Chev. 3299 29 3 0 Chev. 5705 16 4 0 Chev. 4504 22 3 0 Chev. 5104 22 2 0 Chev. 3667 24 2 0 Chev. 3955 19 3 0 Datsun 6229 23 4 1 Datsun 4589 35 5 1 Datsun 5079 24 4 1 Datsun 8129 21 4 1
The program below reads the data and creates a temporary data file called auto. The descriptive statistics shown in this module are all performed on this data file called auto.
DATA auto ; input MAKE $ PRICE MPG REP78 FOREIGN ; DATALINES; AMC 4099 22 3 0 AMC 4749 17 3 0 AMC 3799 22 3 0 Audi 9690 17 5 1 Audi 6295 23 3 1 BMW 9735 25 4 1 Buick 4816 20 3 0 Buick 7827 15 4 0 Buick 5788 18 3 0 Buick 4453 26 3 0 Buick 5189 20 3 0 Buick 10372 16 3 0 Buick 4082 19 3 0 Cad. 11385 14 3 0 Cad. 14500 14 2 0 Cad. 15906 21 3 0 Chev. 3299 29 3 0 Chev. 5705 16 4 0 Chev. 4504 22 3 0 Chev. 5104 22 2 0 Chev. 3667 24 2 0 Chev. 3955 19 3 0 Datsun 6229 23 4 1 Datsun 4589 35 5 1 Datsun 5079 24 4 1 Datsun 8129 21 4 1 ; RUN; PROC PRINT DATA=auto(obs=10); RUN;
The output of the proc print is shown below. You can compare the program to the output below.
OBS MAKE PRICE MPG REP78 FOREIGN 1 AMC 4099 22 3 0 2 AMC 4749 17 3 0 3 AMC 3799 22 3 0 4 Audi 9690 17 5 1 5 Audi 6295 23 3 1 6 BMW 9735 25 4 1 7 Buick 4816 20 3 0 8 Buick 7827 15 4 0 9 Buick 5788 18 3 0 10 Buick 4453 26 3 0
We can use proc freq to produce frequency tables. Below, we use it to make frequency tables for make, rep78 and foreign.
PROC FREQ DATA=auto; TABLES make ; RUN; PROC FREQ DATA=auto; TABLES rep78 ; RUN; PROC FREQ DATA=auto; TABLES foreign ; RUN;
Here is the output produced by the proc freq statements above.
Cumulative Cumulative
MAKE Frequency Percent Frequency Percent
----------------------------------------------------
AMC 3 11.5 3 11.5
Audi 2 7.7 5 19.2
BMW 1 3.8 6 23.1
Buick 7 26.9 13 50.0
Cad. 3 11.5 16 61.5
Chev. 6 23.1 22 84.6
Datsun 4 15.4 26 100.0
Cumulative Cumulative
REP78 Frequency Percent Frequency Percent
---------------------------------------------------
2 3 11.5 3 11.5
3 15 57.7 18 69.2
4 6 23.1 24 92.3
5 2 7.7 26 100.0
Cumulative Cumulative
FOREIGN Frequency Percent Frequency Percent
-----------------------------------------------------
0 19 73.1 19 73.1
1 7 26.9 26 100.0
Instead of having three separate proc freqs, we could have done this all in one proc freq step as illustrated below.
PROC FREQ DATA=auto; TABLES make price mpg rep78 foreign ; RUN;
Let's use proc freq to look at a cross tabulation of the repair history of the cars (rep78) for foreign and domestic cars (foreign). The proc freq statements for this are shown below.
PROC FREQ DATA=auto; TABLES rep78*foreign ; RUN;
This is the output produced.
TABLE OF REP78 BY FOREIGN
REP78 FOREIGN
Frequency|
Percent |
Row Pct |
Col Pct | 0| 1| Total
---------+--------+--------+
2 | 3 | 0 | 3
| 11.54 | 0.00 | 11.54
| 100.00 | 0.00 |
| 15.79 | 0.00 |
---------+--------+--------+
3 | 14 | 1 | 15
| 53.85 | 3.85 | 57.69
| 93.33 | 6.67 |
| 73.68 | 14.29 |
---------+--------+--------+
4 | 2 | 4 | 6
| 7.69 | 15.38 | 23.08
| 33.33 | 66.67 |
| 10.53 | 57.14 |
---------+--------+--------+
5 | 0 | 2 | 2
| 0.00 | 7.69 | 7.69
| 0.00 | 100.00 |
| 0.00 | 28.57 |
---------+--------+--------+
Total 19 7 26
73.08 26.92 100.00
We can show just the cell percentages to make the table easier to read by using the norow, nocol and nofreq options on the tables statement to suppress the printing of the row percentages, column percentages and frequencies (leaving just the cell percentages). Note that the options come after the / on the tables statement.
PROC FREQ DATA=auto; TABLES rep78*foreign / NOROW NOCOL NOFREQ ; RUN;
The output is shown below.
TABLE OF REP78 BY FOREIGN
REP78 FOREIGN
Percent | 0| 1| Total
--------+--------+--------+
2 | 11.54 | 0.00 | 11.54
--------+--------+--------+
3 | 53.85 | 3.85 | 57.69
--------+--------+--------+
4 | 7.69 | 15.38 | 23.08
--------+--------+--------+
5 | 0.00 | 7.69 | 7.69
--------+--------+--------+
Total 19 7 26
73.08 26.92 100.00
The order of the options does not matter. We would have gotten the same output had we written the command like this.
PROC FREQ DATA=auto; TABLES rep78*foreign / NOFREQ NOROW NOCOL ; RUN;
To produce summary statistics, proc means can be used. Below, proc means is used to get descriptive statistics for the variable mpg.
PROC MEANS DATA=auto; VAR mpg; RUN;
The results of the proc means are shown below.
Analysis Variable : MPG N Mean Std Dev Minimum Maximum ---------------------------------------------------------- 26 20.9230769 4.7575042 14.0000000 35.0000000 ----------------------------------------------------------
Suppose we would like to get the summary statistics separately for foreign and domestic cars (indicated by the variable foreign). We can use the class statement as shown below to get separate results for the different values of foreign.
PROC MEANS DATA=auto; CLASS foreign ; VAR mpg; RUN;
As you see below, the results are presented separately for the seven foreign cars (foreign equals 1) and the 19 domestic cars (when foreign is 0).
Analysis Variable : MPG
FOREIGN N Obs N Mean Std Dev Minimum Maximum
-------------------------------------------------------------
0 19 19 19.78 4.0356598 14.0000 29.00
1 7 7 24.00 5.5075705 17.0000 35.00
--------------------------------------------------------------
You can use proc univariate to get more detailed summary statistics, as shown below.
PROC UNIVARIATE DATA=auto; VAR mpg; RUN;
And here are the results of the proc univariate.
Univariate Procedure
Variable=MPG
Moments
N 26 Sum Wgts 26
Mean 20.92308 Sum 544
Std Dev 4.757504 Variance 22.63385
Skewness 0.935473 Kurtosis 1.7927
USS 11948 CSS 565.8462
CV 22.73807 Std Mean 0.933023
T:Mean=0 22.42503 Pr>|T| 0.0001
Num ^= 0 26 Num > 0 26
M(Sign) 13 Pr>=|M| 0.0001
Sgn Rank 175.5 Pr>=|S| 0.0001
Quantiles(Def=5)
100% Max 35 99% 35
75% Q3 23 95% 29
50% Med 21 90% 26
25% Q1 17 10% 15
0% Min 14 5% 14
1% 14
Range 21
Q3-Q1 6
Mode 22
Extremes
Lowest Obs Highest Obs
14( 15) 24( 25)
14( 14) 25( 6)
15( 8) 26( 10)
16( 18) 29( 17)
16( 12) 35( 24)
To obtain separate univariate results for foreign and domestic cars, you would naturally think about the class statement that we used with proc means. While many SAS PROCs permit the use of the class statement, proc univariate does not permit the class statement. Instead, we can use proc sort to sort the data by foreign and then with the proc univariate use the by statement as illustrated below.
PROC SORT DATA=auto; BY foreign; RUN; PROC UNIVARIATE DATA=auto; BY foreign; VAR mpg; RUN;
As you see in the output below, you get a complete set of output for the case where foreign is 0 and then another set of output when foreign is 1.
FOREIGN=0
Univariate Procedure
Variable=MPG
Moments
N 19 Sum Wgts 19
Mean 19.78947 Sum 376
Std Dev 4.03566 Variance 16.28655
Skewness 0.477379 Kurtosis 0.041198
USS 7734 CSS 293.1579
CV 20.39296 Std Mean 0.925844
T:Mean=0 21.37453 Pr>|T| 0.0001
Num ^= 0 19 Num > 0 19
M(Sign) 9.5 Pr>=|M| 0.0001
Sgn Rank 95 Pr>=|S| 0.0001
Quantiles(Def=5)
100% Max 29 99% 29
75% Q3 22 95% 29
50% Med 20 90% 26
25% Q1 16 10% 14
0% Min 14 5% 14
1% 14
Range 15
Q3-Q1 6
Mode 22
Extremes
Lowest Obs Highest Obs
14( 12) 22( 16)
14( 11) 22( 17)
15( 5) 24( 18)
16( 15) 26( 7)
16( 9) 29( 14)
FOREIGN=1
Univariate Procedure
Variable=MPG
Moments
N 7 Sum Wgts 7
Mean 24 Sum 168
Std Dev 5.507571 Variance 30.33333
Skewness 1.340812 Kurtosis 3.286052
USS 4214 CSS 182
CV 22.94821 Std Mean 2.081666
T:Mean=0 11.52923 Pr>|T| 0.0001
Num ^= 0 7 Num > 0 7
M(Sign) 3.5 Pr>=|M| 0.0156
Sgn Rank 14 Pr>=|S| 0.0156
Quantiles(Def=5)
100% Max 35 99% 35
75% Q3 25 95% 35
50% Med 23 90% 35
25% Q1 21 10% 17
0% Min 17 5% 17
1% 17
Range 18
Q3-Q1 4
Mode 23
Extremes
Lowest Obs Highest Obs
17( 1) 23( 2)
21( 7) 23( 4)
23( 4) 24( 6)
23( 2) 25( 3)
24( 6) 35( 5)
If you make a crosstab with proc freq and one of the
variables has large number of values (say 10 or more) the crosstab table could be very
hard to read. In such cases, try using the list option on the
tables statement, e.g.,
TABLES rep78*foreign / LIST ;
When using the by statement in proc univariate, if you choose a by variable with a large number of values (say 5, 10, or more) it will produce a very large amount of output. In such cases, you may try to use proc means with a class statement instead of proc univariate.
For information on Statistical Tests in SAS, see the SAS Learning Module An Overview of Statistical Tests in SAS.
You can view the SAS program associated with this module by clicking descript.sas . While viewing the file, you can save it by choosing File then Save As from the pull-down menu of your web browser -- In the Save As dialog box, change the file name to descript.sas and then choose the directory where you want to save the file, then click Save.
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services