|
|
|
||||
|
Help the Stat Consulting Group by
giving a gift
| |||||
|
Loading
|
|||||
| read.table | read text files |
| read.csv | read common separated files |
| read.fwf | read fixed format text files |
| read.dta | read Stata (.dta) data files |
| save | save data in an R data file |
| load | read data in an R data file |
| names | list or modify the variable names of a data frame |
| head | return the first part of an object |
| dim | display the dimensions of an object |
| list | create a list object |
| rm | remove objects |
| library | load an installed package |
Here are the four data sets used for this section, hs0.csv, hs0_1.csv, schdat_fix.txt, and hsb2.dta. These are data sets are only for this section. The rest of the sections will use data sets over the internet.
Here is the link to the syntax file used for this section.
One of the most commonly used ASCII data formats is comma-separated-values (csv) format. Files of these types can be created using a spreadsheet program, such as Excel, or by many database programs. We will now read the csv file hs0.csv from the R_data directory using the read.table function. Here is a look at the first five lines of the hs0.csv file, notice the first line is a list of variable names
gender,id,race,ses,schtyp,prgtype,read,write,math,science,socst 0,70,4,1,1,general,57,52,41,47,57 1,121,4,2,1,vocati,68,59,53,63,61 0,86,4,3,1,general,44,33,54,58,31 0,141,4,3,1,vocati,63,44,47,53,56
data1 <- read.table("hs0.csv", header=T, sep=",")
names(data1)
[1] "gender" "id" "race" "ses" "schtyp" "prgtype" "read"
[8] "write" "math" "science" "socst"
head(data1)
gender id race ses schtyp prgtype read write math science socst
1 0 70 4 1 1 general 57 52 41 47 57
2 1 121 4 2 1 vocati 68 59 53 63 61
3 0 86 4 3 1 general 44 33 54 58 31
4 0 141 4 3 1 vocati 63 44 47 53 56
5 0 172 4 2 1 academic 47 52 57 53 61
6 0 113 4 2 1 academic 44 52 51 63 61
The save() and load() functions can be used to save and read data from R data files.
# saves as an R object save(data1,file="data1.rda")
# checking to see if data1.rda has been created dir()
# clear everything out of memory rm(list=ls())
# check that everything is gone
ls()
# load the R data into memory
load("data1.rda")
tail(data1)
rm(list=ls()) # clear everything out of memory
The following segment is the beginning part of hs0_1.csv file. This data file doesn't have variable names on the first line of data file. Also notice that the line in bold italics has two consecutive commas near the end. This means that the value is missing in between.
0,70,4,1,1,"general",57,52,41,47,57 1,121,4,2,1,"vocati",68,59,53,63,61 0,86,4,3,1,"general",44,33,54,58,31 0,141,4,3,1,"vocati",63,44,47,53,56 0,172,4,2,1,"academic",47,52,57,53,61 0,113,4,2,1,"academic",44,52,51,63,61 0,50,3,2,1,"general",50,59,42,53,61 0,11,1,2,1,"academic",34,46,45,39,36 0,84,4,2,1,"general",63,57,54,,51 0,48,3,2,1,"academic",57,55,52,50,51 0,75,4,2,1,"vocati",60,46,51,53,61 0,60,5,2,1,"academic",57,65,51,63,61 0,95,4,3,1,"academic",73,60,71,61,71The read.table() function will read in the data file hs0_1.csv in a data frame called temp. We will also print out the five observations to check that the data input was successful.
#reading in hs0_1.csv (no column names)
temp <- read.table('hs0_1.csv', sep=",")
names(temp) <- c("gender","id","race","ses","schtyp","prgtype","read","write","math","science","socst")
# list observations 5 through 10 to check the data
temp[5:10, ]
gender id race ses schtyp prgtype read write math science socst
5 0 172 4 2 1 academic 47 52 57 53 61
6 0 113 4 2 1 academic 44 52 51 63 61
7 0 50 3 2 1 general 50 59 42 53 61
8 0 11 1 2 1 academic 34 46 45 39 36
9 0 84 4 2 1 general 63 57 54 NA 51
10 0 48 3 2 1 academic 57 55 52 50 51
The read.table() function can also be used to read a data file over the internet.
hsb2<-read.table("http://www.ats.ucla.edu/stat/R/notes/hsb2.csv", sep=',', header=T)
hsb2[1:5,]
id female race ses schtyp prog read write math science socst
1 70 male white low public general 57 52 41 47 57
2 121 female white middle public vocation 68 59 53 63 61
3 86 male white high public general 44 33 54 58 31
4 141 male white high public vocation 63 44 47 53 56
5 172 male white middle public academic 47 52 57 53 61
We can also use the script editor to input data. Here is an example.
a<-read.csv(stdin()) gender,id,race,ses,schtyp,prgtype,read,write,math,science,socst 0,70,4,1,1,general,57,52,41,47,57 1,121,4,2,1,vocati,68,59,53,63,61 0,86,4,3,1,general,44,33,54,58,31 0,141,4,3,1,vocati,63,44,47,53,56 # checking the dimension of a dim(a)
Another type of commonly used ASCII data format is fixed format. In this format data are placed in a fixed column for each observation. It requires a codebook to specify which column corresponds to which variable. Here is small example of this type of data from the file called schdat_fix.txt with a codebook. The information about the column numbers from the codebook is used in the width argument.
195 094951 26386161941 38780081841 479700 870 56878163690 66487182960 786 069 0 88194193921 98979090781 107868180801
variable name column number id 1-2 a1 3-4 t1 5-6 gender 7 a2 8-9 t2 10-11 tgender 12
To read these data we use the read.fwf() function on fixed format data instead of the read.table() function. One of the main differences between these two function is that we use the width argument which indicates the width of each variable instead of using the sep argument to indicate the start of each variable. Since the variable id is two digits wide the first number in the vector input for width is 2.
fixed <- read.fwf("schdat_fix.txt", width = c(2, 2, 2, 1, 2, 2, 1))
names(fixed) <- c("id", "a1", "t1", "gender", "a2", "t2", "tgender")
# check the data
fixed
id a1 t1 gender a2 t2 tgender
1 1 95 NA 0 94 95 1
2 2 63 86 1 61 94 1
3 3 87 80 0 81 84 1
4 4 79 70 0 NA 87 0
5 5 68 78 1 63 69 0
6 6 64 87 1 82 96 0
7 7 86 NA 0 69 NA 0
8 8 81 94 1 93 92 1
9 9 89 79 0 90 78 1
10 10 78 68 1 80 80 1
Last but not least, sometimes we may want read data from other statistical packages, such as Stata. To this end, we will have to load a package called foreign using the library function. If a package has not been downloaded, we will have to first install it via install.packages function and then we can load it.
# load library foreign for reading foreign datasets library(foreign) # read stata data file hstata <- read.dta(file="hsb2.dta") head(hstata) id female race ses schtyp prog read write math science socst 1 70 male white low public general 57 52 41 47 57 2 121 female white middle public vocation 68 59 53 63 61 3 86 male white high public general 44 33 54 58 31 4 141 male white high public vocation 63 44 47 53 56 5 172 male white middle public academic 47 52 57 53 61 6 113 male white middle public academic 44 52 51 63 61
# read stata data file from a web server hstata1 <- read.dta(file="http://www.ats.ucla.edu/stat/stata/notes/hsb2.dta") head(hstata1)
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services