UCLA Academic Technology Services HomeServicesClassesContactJobs
Help the Stat Consulting Group by giving a gift             
Loading

R Class Notes
Entering Data


1.0 R functions used in this unit, data sets and the script file

read.table read text files
read.csv read common separated files
read.fwf read fixed format text files
read.dta read Stata (.dta) data files
save save data in an R data file
load read data in an R data file
names list or modify the variable names of a data frame
head return the first part of an object
dim display the dimensions of an object
list create a list object
rm remove objects
library load an installed package

Here are the four data sets used for this section, hs0.csv, hs0_1.csv, schdat_fix.txt,  and hsb2.dta. These are data sets are only for this section. The rest of the sections will use data sets over the internet.

Here is the link to the syntax file used for this section.

2.0 Comma-separated files

One of the most commonly used ASCII data formats is comma-separated-values (csv) format. Files of these types can be created using a spreadsheet program, such as Excel, or by many database programs. We will now read the csv file hs0.csv from the R_data directory using the read.table function. Here is a look at the first five lines of the hs0.csv file, notice the first line is a list of variable names

gender,id,race,ses,schtyp,prgtype,read,write,math,science,socst
0,70,4,1,1,general,57,52,41,47,57
1,121,4,2,1,vocati,68,59,53,63,61
0,86,4,3,1,general,44,33,54,58,31
0,141,4,3,1,vocati,63,44,47,53,56
data1 <- read.table("hs0.csv", header=T, sep=",")
names(data1)
 [1] "gender"  "id"      "race"    "ses"     "schtyp"  "prgtype" "read"   
 [8] "write"   "math"    "science" "socst" 
head(data1)
  gender  id race ses schtyp  prgtype read write math science socst
1      0  70    4   1      1  general   57    52   41      47    57
2      1 121    4   2      1   vocati   68    59   53      63    61
3      0  86    4   3      1  general   44    33   54      58    31
4      0 141    4   3      1   vocati   63    44   47      53    56
5      0 172    4   2      1 academic   47    52   57      53    61
6      0 113    4   2      1 academic   44    52   51      63    61

The save() and load() functions can be used to save and read data from R data files.

# saves as an R object
save(data1,file="data1.rda")  
# checking to see if data1.rda has been created
dir()
# clear everything out of memory
rm(list=ls())  
# check that everything is gone
ls()           

# load the R data into memory
load("data1.rda")  
tail(data1)

rm(list=ls())  # clear everything out of memory
The following segment is the beginning part of hs0_1.csv file. This data file doesn't have variable names on the first line of data file.  Also notice that the line in bold italics has two consecutive commas near the end. This means that the value is missing in between.
0,70,4,1,1,"general",57,52,41,47,57
1,121,4,2,1,"vocati",68,59,53,63,61
0,86,4,3,1,"general",44,33,54,58,31
0,141,4,3,1,"vocati",63,44,47,53,56
0,172,4,2,1,"academic",47,52,57,53,61
0,113,4,2,1,"academic",44,52,51,63,61
0,50,3,2,1,"general",50,59,42,53,61
0,11,1,2,1,"academic",34,46,45,39,36
0,84,4,2,1,"general",63,57,54,,51
0,48,3,2,1,"academic",57,55,52,50,51
0,75,4,2,1,"vocati",60,46,51,53,61
0,60,5,2,1,"academic",57,65,51,63,61
0,95,4,3,1,"academic",73,60,71,61,71
The read.table() function will read in the data file hs0_1.csv in a data frame called temp. We will also print out the five observations to check that the data input was successful.

#reading in hs0_1.csv (no column names)
temp <- read.table('hs0_1.csv', sep=",") 
names(temp) <- c("gender","id","race","ses","schtyp","prgtype","read","write","math","science","socst") 
# list observations 5 through 10 to check the data
temp[5:10, ]  

   gender  id race ses schtyp  prgtype read write math science socst
5       0 172    4   2      1 academic   47    52   57      53    61
6       0 113    4   2      1 academic   44    52   51      63    61
7       0  50    3   2      1  general   50    59   42      53    61
8       0  11    1   2      1 academic   34    46   45      39    36
9       0  84    4   2      1  general   63    57   54      NA    51
10      0  48    3   2      1 academic   57    55   52      50    51

The read.table() function can also be used to read a data file over the internet.

hsb2<-read.table("http://www.ats.ucla.edu/stat/R/notes/hsb2.csv", sep=',', header=T)
hsb2[1:5,]

   id female  race    ses schtyp     prog read write math science socst
1  70   male white    low public  general   57    52   41      47    57
2 121 female white middle public vocation   68    59   53      63    61
3  86   male white   high public  general   44    33   54      58    31
4 141   male white   high public vocation   63    44   47      53    56
5 172   male white middle public academic   47    52   57      53    61

We can also use the script editor to input data. Here is an example.

a<-read.csv(stdin())
gender,id,race,ses,schtyp,prgtype,read,write,math,science,socst
0,70,4,1,1,general,57,52,41,47,57
1,121,4,2,1,vocati,68,59,53,63,61
0,86,4,3,1,general,44,33,54,58,31
0,141,4,3,1,vocati,63,44,47,53,56


# checking the dimension of a
dim(a)

3.0 Fixed format files

Another type of commonly used ASCII data format is fixed format. In this format data are placed in a fixed column for each observation. It requires a codebook to specify which column corresponds to which variable. Here is small example of this type of data from the file called schdat_fix.txt with a codebook. The information about the column numbers from the codebook is used in the width argument.

        195  094951
        26386161941
        38780081841
        479700  870
        56878163690
        66487182960
        786  069  0
        88194193921
        98979090781
       107868180801

variable name column number
id 1-2
a1 3-4
t1 5-6
gender 7
a2 8-9
t2 10-11
tgender 12

To read these data we use the read.fwf() function on fixed format data instead of the read.table() function. One of the main differences between these two function is that we use the width argument which indicates the width of each variable instead of using the sep argument to indicate the start of each variable. Since the variable id is two digits wide the first number in the vector input for width is 2.

fixed <- read.fwf("schdat_fix.txt", width = c(2, 2, 2, 1, 2, 2, 1))
names(fixed) <- c("id", "a1", "t1", "gender", "a2", "t2", "tgender")

#  check the data
fixed  
   id a1 t1 gender a2 t2 tgender
1   1 95 NA      0 94 95       1
2   2 63 86      1 61 94       1
3   3 87 80      0 81 84       1
4   4 79 70      0 NA 87       0
5   5 68 78      1 63 69       0
6   6 64 87      1 82 96       0
7   7 86 NA      0 69 NA       0
8   8 81 94      1 93 92       1
9   9 89 79      0 90 78       1
10 10 78 68      1 80 80       1

4.0 Foreign data types

Last but not least, sometimes we may want read data from other statistical packages, such as Stata. To this end, we will have to load a package called foreign using the library function. If a package has not been downloaded, we will have to first install it via install.packages function and then we can load it.

# load library foreign for reading foreign datasets
library(foreign)  

# read stata data file
hstata <- read.dta(file="hsb2.dta")  
head(hstata)

   id female  race    ses schtyp     prog read write math science socst
1  70   male white    low public  general   57    52   41      47    57
2 121 female white middle public vocation   68    59   53      63    61
3  86   male white   high public  general   44    33   54      58    31
4 141   male white   high public vocation   63    44   47      53    56
5 172   male white middle public academic   47    52   57      53    61
6 113   male white middle public academic   44    52   51      63    61
# read stata data file from a web server
hstata1 <- read.dta(file="http://www.ats.ucla.edu/stat/stata/notes/hsb2.dta")  
head(hstata1)

5.0 For More Information


How to cite this page

Report an error on this page or leave a comment

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California