|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
Users of Stata are likely familiar with the concept of collapsing data: reducing the number of observations in your data, keeping one observation for each value or combination of values from one or more variables and calculating some summary of the other variables at this level. For examples of this in Stata, see our Stata Learning Module on collapse.
This data management step can also be done in R using the summaryBy command in the doBy package. Let's consider the dataset hsb2. We can start by collapsing the data by prog and calculating the mean of socst. The default summary calculation is the mean, as indicated by the output.
library(doBy) summaryBy(socst ~ prog, data=hsb2) prog socst.mean 1 1 50.60000 2 2 56.69524 3 3 45.02000
We can easily make our collapse more complex, creating one observation for each combination of prog and female and ses, calculating both the mean and standard deviation of several variables, and saving this as a new object.
collapse1 <- summaryBy(socst + math ~ prog + ses + female, FUN=c(mean,sd), data=hsb2) collapse1 prog ses female socst.mean math.mean socst.sd math.sd 1 1 1 0 47.57143 46.71429 6.502747 8.118175 2 1 1 1 49.00000 48.22222 5.894913 7.546154 3 1 2 0 50.50000 51.10000 8.959787 9.267026 4 1 2 1 52.10000 51.10000 9.938142 6.026792 5 1 3 0 57.25000 54.00000 17.500000 4.320494 6 1 3 1 49.60000 50.40000 10.714476 7.700649 7 2 1 0 46.75000 50.00000 11.926860 6.683313 8 2 1 1 55.13333 55.00000 10.098562 10.281745 9 2 2 0 55.54545 58.13636 8.985318 10.086745 10 2 2 1 55.40909 54.50000 8.511388 7.255540 11 2 3 0 59.33333 57.42857 7.799573 7.743200 12 2 3 1 59.61905 59.42857 9.058014 8.152125 13 3 1 0 32.50000 46.75000 4.725816 5.251984 14 3 1 1 38.25000 42.25000 8.189715 3.918819 15 3 2 0 44.00000 48.20000 11.464230 9.540889 16 3 2 1 51.00000 46.06250 8.755950 7.540723 17 3 3 0 50.25000 42.25000 6.946222 3.947573 18 3 3 1 46.00000 55.66667 10.000000 10.503968
If we wish to summarize our data in a way that does not already exist as a function, we can write the function and then pass this to summaryBy. Below, we write a quick function halfmean and then apply it to math and socst within each combination of prog, ses, and female.
halfmean <- function(x) return(mean(x)/2) summaryBy(socst + math ~ prog + ses + female, FUN=halfmean, data=hsb2sort) prog ses female socst.halfmean math.halfmean 1 1 1 0 23.78571 23.35714 2 1 1 1 24.50000 24.11111 3 1 2 0 25.25000 25.55000 4 1 2 1 26.05000 25.55000 5 1 3 0 28.62500 27.00000 6 1 3 1 24.80000 25.20000 7 2 1 0 23.37500 25.00000 8 2 1 1 27.56667 27.50000 9 2 2 0 27.77273 29.06818 10 2 2 1 27.70455 27.25000 11 2 3 0 29.66667 28.71429 12 2 3 1 29.80952 29.71429 13 3 1 0 16.25000 23.37500 14 3 1 1 19.12500 21.12500 15 3 2 0 22.00000 24.10000 16 3 2 1 25.50000 23.03125 17 3 3 0 25.12500 21.12500 18 3 3 1 23.00000 27.83333
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services