UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

R Class Notes
Managing Data


1.0 R functions used in this unit and the syntax file

mean calculates the mean
names lists the variable names of a data frame
table creates a frequency table
rbind combines rows of data
order sort data frames
merge match merges two data frames
cbind combines columns of data

Here is the link to the syntax file used for this section.

2.0 Keeping and dropping a subset of variables or observations

Read in the hs1 data via the internet using the read.table function.

hs1 <- read.table("http://www.ats.ucla.edu/stat/R/notes/hs1.csv", header=T, sep=",")

Keeping only the observations where the reading score is 60 or higher.

hs1.read.well <- hs1[hs1$read >= 60, ]

Comparing means of read in the original hs1 data frame and the new smaller hs1.read.well data frame. To keep from getting confused we will use the convention of using the data name, dollar sign, variable name. For example, hs1$read is the read variable from the hs1 data.

mean(hs1.read.well$read)
mean(hs1$read)

Keeping only the variables read and write from the hs1 data frame.

hs2<-hs1[, c("read", "write")]
# another way of doing the same thing
hs3<-hs1[, c(7, 8)]
names(hs3)

Dropping the variables read and write from the hs1 data frame by using the column indices corresponding to these two variables with a negative sign.

hs2.drop<-hs1[, -c(7, 8)]
names(hs2.drop)

3.0 Append files

We will subset hs1 to two data sets, one for female and one for male. We then put them back together.

hsfemale<-hs1[female==1, ]
hsmale<-hs1[female==0, ]

dim(hsfemale)
dim(hsmale)

hs.all<-rbind(hsfemale, hsmale)
dim(hs.all)

4.0 Merging Files

We will create two data sets from hs1, one contains demographic variables and the other one contains test scores. We then merge the two data sets by the id variable.

hs.demo<-hs1[, c("id", "ses", "female", "race")]
hs.scores<-hs1[, c("id", "read", "write", "math", "science")]
dim(hs.demo)
dim(hs.scores)

hs.merge <- merge(hs.demo, hs.scores, by="id", all=T)
head(hs.merge)
dim(hs.merge)

If the variable that we were merging on had different names in each data frame then we could use the by.x and by.y arguments. In the by.x argument we would list the name of the variable(s) that was in the data frame listed first in the merge function (in this case in hs.demo) and in the by.y argument we would name the variable(s) that was in the data frame listed second (in this case hs.scores).

hs.merge1 <- merge(hs.demo, hs.scores, by.x="id", by.y="id", all=T)

5.0 For More Information


How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California