R FAQ
How can I get a table of basic descriptive statistics for my variables?

Among many user-written packages, package pastecs has an easy to use function called stat.desc to display a table of descriptive statistics for a list of variables. You can download the package and then load it into memory as shown below, assuming that your computer is connected to the internet. We will illustrate this using the hs0 data file.

install.packages("pastecs")
trying URL `http://cran.r-project.org/bin/windows/contrib/2.0/PACKAGES'
Content type `text/plain; charset=iso-8859-1' length 27130 bytes
opened URL
downloaded 26Kb
trying URL `http://cran.r-project.org/bin/windows/contrib/2.0/pastecs_1.2-0.zip'
Content type `application/zip' length 1824980 bytes
opened URL
downloaded 1782Kb
package 'pastecs' successfully unpacked and MD5 sums checked
Delete downloaded files (y/N)? y
updating HTML package descriptions
library(pastecs)
hs0<-read.table("http://www.ats.ucla.edu/stat/data/hs0.csv", sep=",", header=T)
head(hs0)

   id female  race    ses schtyp     prog read write math science socst
1  70   male white    low public  general   57    52   41      47    57
2 121 female white middle public vocation   68    59   53      63    61
3  86   male white   high public  general   44    33   54      58    31
4 141   male white   high public vocation   63    44   47      53    56
5 172   male white middle public academic   47    52   57      53    61
6 113   male white middle public academic   44    52   51      63    61

Let's say we want a table of descriptive statistics for test scores. 

attach(hs0)
scores<-cbind(read, write, math, science, socst)
stat.desc(scores)
                     read        write         math      science        socst
nbr.val      2.000000e+02 2.000000e+02 2.000000e+02 1.950000e+02 2.000000e+02
nbr.null     0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
nbr.na       0.000000e+00 0.000000e+00 0.000000e+00 5.000000e+00 0.000000e+00
min          2.800000e+01 3.100000e+01 3.300000e+01 2.600000e+01 2.600000e+01
max          7.600000e+01 6.700000e+01 7.500000e+01 7.400000e+01 7.100000e+01
range        4.800000e+01 3.600000e+01 4.200000e+01 4.800000e+01 4.500000e+01
sum          1.044600e+04 1.055500e+04 1.052900e+04 1.007400e+04 1.048100e+04
median       5.000000e+01 5.400000e+01 5.200000e+01 5.300000e+01 5.200000e+01
mean         5.223000e+01 5.277500e+01 5.264500e+01 5.166154e+01 5.240500e+01
SE.mean      7.249921e-01 6.702372e-01 6.624493e-01 7.065208e-01 7.591352e-01
CI.mean.0.95 1.429653e+00 1.321679e+00 1.306321e+00 1.393448e+00 1.496982e+00
var          1.051227e+02 8.984359e+01 8.776781e+01 9.733846e+01 1.152573e+02
std.dev      1.025294e+01 9.478586e+00 9.368448e+00 9.866026e+00 1.073579e+01
coef.var     1.963036e-01 1.796037e-01 1.779551e-01 1.909743e-01 2.048620e-01

Well, you may not like the scientific notation that much. Here is what you can do to change the format of display by setting the options.

options(scipen=100)
options(digits=2)
stat.desc(scores)
                 read    write     math  science    socst
nbr.val        200.00   200.00   200.00   195.00   200.00
nbr.null         0.00     0.00     0.00     0.00     0.00
nbr.na           0.00     0.00     0.00     5.00     0.00
min             28.00    31.00    33.00    26.00    26.00
max             76.00    67.00    75.00    74.00    71.00
range           48.00    36.00    42.00    48.00    45.00
sum          10446.00 10555.00 10529.00 10074.00 10481.00
median          50.00    54.00    52.00    53.00    52.00
mean            52.23    52.77    52.65    51.66    52.41
SE.mean          0.72     0.67     0.66     0.71     0.76
CI.mean.0.95     1.43     1.32     1.31     1.39     1.50
var            105.12    89.84    87.77    97.34   115.26
std.dev         10.25     9.48     9.37     9.87    10.74
coef.var         0.20     0.18     0.18     0.19     0.20
What if we only want the descriptive statistics, such as the min, max and std.dev? We can add an option as shown below.
stat.desc(scores, basic=F)
               read write  math science  socst
median        50.00 54.00 52.00   53.00  52.00
mean          52.23 52.77 52.65   51.66  52.41
SE.mean        0.72  0.67  0.66    0.71   0.76
CI.mean.0.95   1.43  1.32  1.31    1.39   1.50
var          105.12 89.84 87.77   97.34 115.26
std.dev       10.25  9.48  9.37    9.87  10.74
coef.var       0.20  0.18  0.18    0.19   0.20
In the same fashion, we can also display only the basic statistics such as the number of observations and number of missing values.
stat.desc(scores, desc=F)
          read write  math science socst
nbr.val    200   200   200     195   200
nbr.null     0     0     0       0     0
nbr.na       0     0     0       5     0
min         28    31    33      26    26
max         76    67    75      74    71
range       48    36    42      48    45
sum      10446 10555 10529   10074 10481

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.