|
|
|
||||
|
|
|||||
This section will require a little more work than the sections that follow because we need to create a directory on your hard drive.
First, create a directory called mydata in your home directory or wherever you want it to be. Next, note the path to this directory. On a windows machine it might be "C:/mydata" or on a Mac or Unix machine it might be "~/mydata".
Finally, place the following data files into the directory mydata: hs0.csv, hs0_1.csv, schdat_fix.txt, hsb2.dta and hsb2.sav.
Now we are ready to begin.
| read.table | read text files |
| read.fwf | read fixed format text files |
| read.dta | read Stata (.dta) data files |
| read.spss | read SPSS (.sav) data files |
| save | save data in an R data file |
| load | read data in an R data file |
| names | list or modify the variable names of a data frame |
The setwd() function (set working directory) works like the cd command in windows. The getwd() function shows the name of your current directory. Be sure to use that path that you noted above.
setwd("C:/mydata") # set to wherever your data directory is located
getwd() # check that you are in the correct directory
"C:/mydata"
One of the most commonly used ASCII data formats is comma-separated-values (csv) format. Files of these types can be created using a spreadsheet program, such as Excel, or by many database programs. We will now read the csv file hs0.csv from the mydata directory using the read.table function. Here is a look at the first five lines of the hs0.csv file, notice the first line is a list of variable names
gender,id,race,ses,schtyp,prgtype,read,write,math,science,socst 0,70,4,1,1,general,57,52,41,47,57 1,121,4,2,1,vocati,68,59,53,63,61 0,86,4,3,1,general,44,33,54,58,31 0,141,4,3,1,vocati,63,44,47,53,56
data1 <- read.table("hs0.csv", header=T, sep=",")
attach(data1)
names(data1)
[1] "gender" "id" "race" "ses" "schtyp" "prgtype" "read"
[8] "write" "math" "science" "socst"
data1[1:5, ]
gender id race ses schtyp prgtype read write math science socst
1 0 70 4 1 1 general 57 52 41 47 57
2 1 121 4 2 1 vocati 68 59 53 63 61
3 0 86 4 3 1 general 44 33 54 58 31
4 0 141 4 3 1 vocati 63 44 47 53 56
5 0 172 4 2 1 academic 47 52 57 53 61
table(prgtype)
prgtype
academic general vocati
105 45 50
The save() and load() functions can be used to save and read data from R data files.
save(data1,file="data1.rda") # saves as an R object
detach(data1)
rm(list=ls()) # clear everything out of memory
table(prgtype) # check that the data are gone
Error in table(prgtype) : Object "prgtype" not found
load("data1.rda") # load the R data into memory
attach(data1) # attach dataframe
data1[1:5, ]
table(prgtype)
prgtype
academic general vocati
105 45 50
detach(data1)
rm(list=ls()) # clear everything out of memory
The following segment is the beginning part of hs0_1.csv file. This data file doesn't have variable names on the first line of data file. Also notice that the line in bold italics has two consecutive commas near the end. This means that the value is missing in between.
0,70,4,1,1,"general",57,52,41,47,57 1,121,4,2,1,"vocati",68,59,53,63,61 0,86,4,3,1,"general",44,33,54,58,31 0,141,4,3,1,"vocati",63,44,47,53,56 0,172,4,2,1,"academic",47,52,57,53,61 0,113,4,2,1,"academic",44,52,51,63,61 0,50,3,2,1,"general",50,59,42,53,61 0,11,1,2,1,"academic",34,46,45,39,36 0,84,4,2,1,"general",63,57,54,,51 0,48,3,2,1,"academic",57,55,52,50,51 0,75,4,2,1,"vocati",60,46,51,53,61 0,60,5,2,1,"academic",57,65,51,63,61 0,95,4,3,1,"academic",73,60,71,61,71
The read.table() function will read in the data file hs0_1.csv in a data frame called temp. We will also print out the five observations to check that the data input was successful.
temp <- read.table('hs0_1.csv', sep=",") #reading in hs0_1.csv (no column names)
names(temp) <- c("gender","id","race","ses","schtyp","prgtype","read","write","math","science","socst")
temp[5:10, ] # list observations 5 through 10 to check the data
gender id race ses schtyp prgtype read write math science socst
5 0 172 4 2 1 academic 47 52 57 53 61
6 0 113 4 2 1 academic 44 52 51 63 61
7 0 50 3 2 1 general 50 59 42 53 61
8 0 11 1 2 1 academic 34 46 45 39 36
9 0 84 4 2 1 general 63 57 54 NA 51
10 0 48 3 2 1 academic 57 55 52 50 51
The read.table() function can also be used to read a data file over the internet.
hsb2<-read.table("http://www.ats.ucla.edu/stat/R/notes/hsb2.csv", sep=',', header=T)
hsb2[1:5,]
id female race ses schtyp prog read write math science socst
1 70 male white low public general 57 52 41 47 57
2 121 female white middle public vocation 68 59 53 63 61
3 86 male white high public general 44 33 54 58 31
4 141 male white high public vocation 63 44 47 53 56
5 172 male white middle public academic 47 52 57 53 61
Another type of commonly used ASCII data format is fixed format. In this format data are placed in a fixed column for each observation. It requires a codebook to specify which column corresponds to which variable. Here is small example of this type of data from the file called schdat_fix.txt with a codebook. The information about the column numbers from the codebook is used in the sep argument.
195 094951 26386161941 38780081841 479700 870 56878163690 66487182960 786 069 0 88194193921 98979090781 107868180801
variable name column number id 1-2 a1 3-4 t1 5-6 gender 7 a2 8-9 t2 10-11 tgender 12
To read these data we use the read.fwf() function on fixed format data instead of the read.table() function. One of the main differences between these two function is that we use the width argument which indicates the width of each variable instead of using the sep argument to indicate the start of each variable. Since the variable id is two digits wide the first number in the vector input for width is 2.
fixed <- read.fwf("schdat_fix.txt", width = c(2, 2, 2, 1, 2, 2, 1))
names(fixed) <- c("id", "a1", "t1", "gender", "a2", "t2", "tgender")
fixed # check the data
id a1 t1 gender a2 t2 tgender
1 1 95 NA 0 94 95 1
2 2 63 86 1 61 94 1
3 3 87 80 0 81 84 1
4 4 79 70 0 NA 87 0
5 5 68 78 1 63 69 0
6 6 64 87 1 82 96 0
7 7 86 NA 0 69 NA 0
8 8 81 94 1 93 92 1
9 9 89 79 0 90 78 1
10 10 78 68 1 80 80 1
Last but not least, sometimes we may want read data from other statistical packages, such as Stata or SPSS.
detach()
rm(list=ls()) # clear everything out of memory
library(foreign) # library to read foreign datasets
hstata <- read.dta(file="hsb2.dta") # read stata data file
attach(hstata)
table(female)
female
male female
91 109
detach()
rm(list=ls()) # clear everything out of memory
hspss <- read.spss(file="hsb2.sav") # read spss data file
attach(hspss)
table(PROG)
PROG
vocation academic general
50 105 45
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services