UCLA Academic Technology Services HomeServicesClassesContactJobs

R Class Notes
Entering Data


This section will require a little more work than the sections that follow because we need to create a directory on your hard drive.

First, create a directory called mydata in your home directory or wherever you want it to be. Next, note the path to this directory. On a windows machine it might be "C:/mydata" or on a Mac or Unix machine it might be "~/mydata".

Finally, place the following data files into the directory mydata: hs0.csv, hs0_1.csv, schdat_fix.txt, hsb2.dta and hsb2.sav.

Now we are ready to begin.

1.0 R functions used in this unit

read.table read text files
read.fwf read fixed format text files
read.dta read Stata (.dta) data files
read.spss read SPSS (.sav) data files
save save data in an R data file
load read data in an R data file
names list or modify the variable names of a data frame

The setwd() function (set working directory) works like the cd command in windows. The getwd() function shows the name of your current directory. Be sure to use that path that you noted above.

setwd("C:/mydata")  # set to wherever your data directory is located 
getwd()  # check that you are in the correct directory

"C:/mydata"

One of the most commonly used ASCII data formats is comma-separated-values (csv) format. Files of these types can be created using a spreadsheet program, such as Excel, or by many database programs. We will now read the csv file hs0.csv from the mydata directory using the read.table function. Here is a look at the first five lines of the hs0.csv file, notice the first line is a list of variable names

gender,id,race,ses,schtyp,prgtype,read,write,math,science,socst
0,70,4,1,1,general,57,52,41,47,57
1,121,4,2,1,vocati,68,59,53,63,61
0,86,4,3,1,general,44,33,54,58,31
0,141,4,3,1,vocati,63,44,47,53,56
data1 <- read.table("hs0.csv", header=T, sep=",")
attach(data1)
names(data1)
 [1] "gender"  "id"      "race"    "ses"     "schtyp"  "prgtype" "read"   
 [8] "write"   "math"    "science" "socst" 
 
data1[1:5, ]
  gender  id race ses schtyp  prgtype read write math science socst
1      0  70    4   1      1  general   57    52   41      47    57
2      1 121    4   2      1   vocati   68    59   53      63    61
3      0  86    4   3      1  general   44    33   54      58    31
4      0 141    4   3      1   vocati   63    44   47      53    56
5      0 172    4   2      1 academic   47    52   57      53    61

table(prgtype)
prgtype
academic  general   vocati 
     105       45       50

The save() and load() functions can be used to save and read data from R data files.

save(data1,file="data1.rda")  # saves as an R object

detach(data1)
rm(list=ls())  # clear everything out of memory

table(prgtype)  # check that the data are gone
Error in table(prgtype) : Object "prgtype" not found

load("data1.rda")  # load the R data into memory
attach(data1)  # attach dataframe
data1[1:5, ]

table(prgtype)
prgtype
academic  general   vocati 
     105       45       50
     
detach(data1)
rm(list=ls())  # clear everything out of memory
The following segment is the beginning part of hs0_1.csv file. This data file doesn't have variable names on the first line of data file.  Also notice that the line in bold italics has two consecutive commas near the end. This means that the value is missing in between.
0,70,4,1,1,"general",57,52,41,47,57
1,121,4,2,1,"vocati",68,59,53,63,61
0,86,4,3,1,"general",44,33,54,58,31
0,141,4,3,1,"vocati",63,44,47,53,56
0,172,4,2,1,"academic",47,52,57,53,61
0,113,4,2,1,"academic",44,52,51,63,61
0,50,3,2,1,"general",50,59,42,53,61
0,11,1,2,1,"academic",34,46,45,39,36
0,84,4,2,1,"general",63,57,54,,51
0,48,3,2,1,"academic",57,55,52,50,51
0,75,4,2,1,"vocati",60,46,51,53,61
0,60,5,2,1,"academic",57,65,51,63,61
0,95,4,3,1,"academic",73,60,71,61,71

The read.table() function will read in the data file hs0_1.csv in a data frame called temp. We will also print out the five observations to check that the data input was successful.

temp <- read.table('hs0_1.csv', sep=",") #reading in hs0_1.csv (no column names)
names(temp) <- c("gender","id","race","ses","schtyp","prgtype","read","write","math","science","socst") 

temp[5:10, ]  # list observations 5 through 10 to check the data
   gender  id race ses schtyp  prgtype read write math science socst
5       0 172    4   2      1 academic   47    52   57      53    61
6       0 113    4   2      1 academic   44    52   51      63    61
7       0  50    3   2      1  general   50    59   42      53    61
8       0  11    1   2      1 academic   34    46   45      39    36
9       0  84    4   2      1  general   63    57   54      NA    51
10      0  48    3   2      1 academic   57    55   52      50    51

The read.table() function can also be used to read a data file over the internet.

hsb2<-read.table("http://www.ats.ucla.edu/stat/R/notes/hsb2.csv", sep=',', header=T)
hsb2[1:5,]
   id female  race    ses schtyp     prog read write math science socst
1  70   male white    low public  general   57    52   41      47    57
2 121 female white middle public vocation   68    59   53      63    61
3  86   male white   high public  general   44    33   54      58    31
4 141   male white   high public vocation   63    44   47      53    56
5 172   male white middle public academic   47    52   57      53    61

Another type of commonly used ASCII data format is fixed format. In this format data are placed in a fixed column for each observation. It requires a codebook to specify which column corresponds to which variable. Here is small example of this type of data from the file called schdat_fix.txt with a codebook. The information about the column numbers from the codebook is used in the sep argument.

        195  094951
        26386161941
        38780081841
        479700  870
        56878163690
        66487182960
        786  069  0
        88194193921
        98979090781
       107868180801

variable name column number
id 1-2
a1 3-4
t1 5-6
gender 7
a2 8-9
t2 10-11
tgender 12

To read these data we use the read.fwf() function on fixed format data instead of the read.table() function. One of the main differences between these two function is that we use the width argument which indicates the width of each variable instead of using the sep argument to indicate the start of each variable. Since the variable id is two digits wide the first number in the vector input for width is 2.

fixed <- read.fwf("schdat_fix.txt", width = c(2, 2, 2, 1, 2, 2, 1))
names(fixed) <- c("id", "a1", "t1", "gender", "a2", "t2", "tgender")

fixed  #  check the data
   id a1 t1 gender a2 t2 tgender
1   1 95 NA      0 94 95       1
2   2 63 86      1 61 94       1
3   3 87 80      0 81 84       1
4   4 79 70      0 NA 87       0
5   5 68 78      1 63 69       0
6   6 64 87      1 82 96       0
7   7 86 NA      0 69 NA       0
8   8 81 94      1 93 92       1
9   9 89 79      0 90 78       1
10 10 78 68      1 80 80       1

Last but not least, sometimes we may want read data from other statistical packages, such as Stata or SPSS.

detach()
rm(list=ls())  # clear everything out of memory

library(foreign)  # library to read foreign datasets

hstata <- read.dta(file="hsb2.dta")  # read stata data file
attach(hstata)
table(female)
female
  male female 
    91    109 
    
detach()
rm(list=ls())  # clear everything out of memory

hspss <- read.spss(file="hsb2.sav")  # read spss data file
attach(hspss)

table(PROG)
PROG
vocation academic  general 
      50      105       45

2.0 For More Information


How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.