|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
| comment | add comment to an object |
| sapply | apply a function over a list or vector |
| is.factor | check if a variable is a factor variable |
| factor | creates a categorical variable with value labels if desired |
| table | creates frequency table |
Here is the link to the syntax file used for this section.
It is a good practice to label the data sets or variables that we have been working on. This can be accomplished by using the comment function.
# cleaning up
rm(list=ls())
# reading in data
hs0 <- read.table("http://www.ats.ucla.edu/stat/R/notes/hs0.csv", header=T, sep=",")
# commenting the data set
comment(hs0)<-"High school and beyond data"
# checking
comment(hs0)
# variable labels using comment comment(hs0$write)<-"writing score" comment(hs0$read) <-"reading score" # more checking to make sure that our comments stay with the data frame save(hs0,file="hs0.rda") rm(list=ls()) load(file="hs0.rda") comment(hs0) comment(hs0$write)
For the rest of this section, we are going to attach hs0 so our syntax will look cleaner. The search() function displays what is currently on the search path.
search() attach(hs0) search()
We use the sapply function with the is.factor function to check if any of the variables in the hs0 data frame are factor variables.
sapply(hs0, is.factor)
Creating a factor (categorical) variable called schtyp.f for schtyp and a factor variable female for gender with value labels.
schtyp.f <- factor(schtyp, levels=c(1, 2), labels=c("public", "private"))
female <- factor(gender, levels=c(0, 1), labels=c("male", "female"))
table(schtyp.f)
table(female)
Recoding race=5 to be NA (to be missing).
table(hs0$race) hs0$race[hs0$race==5] <-NA table(hs0$race) # displaying the missings as well table(hs0$race, useNA="ifany")
Creating a variable called total = read + write+ math+science
total<-read+write+math+science # noticing the missing values generated summary(total)
Creating a variable called grade based on total.
# initializing a variable
grade<-0
grade[total <=140]<-0
grade[total > 140 & total <= 180] <-1
grade[total > 180 & total <= 210] <-2
grade[total > 210 & total <= 234] <-3
grade[total > 234] <-4
comment(grade)<-"combined grades of read, write, math, science"
grade<-factor(grade, levels=c(0, 1, 2, 3, 4), labels=c("F", "D", "C", "B", "A"))
table(grade)
Creating mean scores in two ways - working with missing values differently.
m1<-(read+write+math+science)/4 m2<-rowMeans(cbind(read, write, math, science)) m2<-rowMeans(cbind(read, write, math, science), na.rm=T)
At this point, we might want to combine the new variables we have created with the original data set. We can use the cbind function for this.
hs1<-cbind(hs0, cbind(schtyp.f, female, total, grade)) table(hs1$race) is.data.frame(hs1)
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California