|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
| mean | calculates the mean |
| names | lists the variable names of a data frame |
| table | creates a frequency table |
| rbind | combines rows of data |
| order | sort data frames |
| merge | match merges two data frames |
| cbind | combines columns of data |
Here is the link to the syntax file used for this section.
Read in the hs1 data via the internet using the read.table function.
hs1 <- read.table("http://www.ats.ucla.edu/stat/R/notes/hs1.csv", header=T, sep=",")
Keeping only the observations where the reading score is 60 or higher.
hs1.read.well <- hs1[hs1$read >= 60, ]
Comparing means of read in the original hs1 data frame and the new smaller hs1.read.well data frame. To keep from getting confused we will use the convention of using the data name, dollar sign, variable name. For example, hs1$read is the read variable from the hs1 data.
mean(hs1.read.well$read) mean(hs1$read)
Keeping only the variables read and write from the hs1 data frame.
hs2<-hs1[, c("read", "write")]
# another way of doing the same thing
hs3<-hs1[, c(7, 8)]
names(hs3)
Dropping the variables read and write from the hs1 data frame by using the column indices corresponding to these two variables with a negative sign.
hs2.drop<-hs1[, -c(7, 8)] names(hs2.drop)
We will subset hs1 to two data sets, one for female and one for male. We then put them back together.
hsfemale<-hs1[female==1, ] hsmale<-hs1[female==0, ] dim(hsfemale) dim(hsmale) hs.all<-rbind(hsfemale, hsmale) dim(hs.all)
We will create two data sets from hs1, one contains demographic variables and the other one contains test scores. We then merge the two data sets by the id variable.
hs.demo<-hs1[, c("id", "ses", "female", "race")]
hs.scores<-hs1[, c("id", "read", "write", "math", "science")]
dim(hs.demo)
dim(hs.scores)
hs.merge <- merge(hs.demo, hs.scores, by="id", all=T)
head(hs.merge)
dim(hs.merge)
If the variable that we were merging on had different names in each data frame then we could use the by.x and by.y arguments. In the by.x argument we would list the name of the variable(s) that was in the data frame listed first in the merge function (in this case in hs.demo) and in the by.y argument we would name the variable(s) that was in the data frame listed second (in this case hs.scores).
hs.merge1 <- merge(hs.demo, hs.scores, by.x="id", by.y="id", all=T)
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California