Help the Stat Consulting Group by giving a gift

How can I read binary data into R?

The code needed to read binary data into R is relatively easy. However, reading the data in

correctlyrequires that you are either already familiar with your data or possess a comprehensive description of the data structure.In the binary data file, information is stored in groups of binary digits. Each binary digit is a zero or one and eight binary digits grouped together is a byte. In order to successfully read binary data, you must know how pieces of information have been parsed into binary. For example, if your data consists of integers, how may bytes should you interpret as representative of one integer in your data? Or if your data contains both positive and negative numbers, how can you distinguish the two? How many pieces of information do you expect to find in the binary data?

Ideally, you know the answers to these questions before starting to read in the binary file. If you do not, you can explore the read in options in R. To get started, we establish a connection to a file and indicate that we will be using the connection to read in binary data. We do this with the file command, providing first the pathname, and the

"rb"for "reading binary". For more details, seehelp(file)in R.

to.read = file("http://www.ats.ucla.edu/stat/r/faq/bintest.dat", "rb")

Next, we use the

readBincommand to begin. If we think the file contains integers, we can start by reading in the first integer and hoping that the size of the integer does not require further specifications. Different platforms store binary data in different ways, and which end of a string of binary values represents the greatest values or smallest values is a difference that can yield very different results from the same set of binary values. This characteristic is called the "endian". The binary files in the examples on this page were written using a PC, which suggests they are little-endian. When reading in binary data that may or may not have been written on a different platform, indicating an endian can be crucial. For example, without addingendian = "little"to the command below while running R on a Mac, the command reads the first integer as 16777216.

readBin(to.read, integer(), endian = "little")[1] 1

Thus, it looks like the first integer in the file is 1. As we repeatedly use

readBincommands, we will work our way through the binary file until we hit the end. We can read in multiple integers at once by adding ann=option to our command. If the n you specify is greater than the number of integers you specified,readBinwill read and display as much as is available, so there is no danger of guessing too large ann. Since we have already read in the first integer, this command will begin at the second.

readBin(to.read, integer(), n = 4, endian = "little")[1] 2 3 4 5

If you know have additional information about what is in your file, you should incorporate that into the readBin command. For example, if you know that you wish to read in integers stored on 4 bytes each, you can indicate this with the

sizeoption:

readBin(to.read, integer(), n = 2, size = 4, endian = "little")[1] 6 7

Similarly, if you know that your file contains characters, complex numbers, or some other type of information, you would adjust the

readBincommand accordingly, changinginteger()tocharacter()orcomplex(). Seehelp(readBin)in R for more details.Since you will likely want to do more than just look at what is contained in the binary file, you will need some strategies for formatting data as you read it in. For example, suppose you are given a binary file with the following description: three numeric variables collected from 200 subjects, the three variable names appear first in the file, the numeric values are integers store on two bytes each, and all of the values for the first variables are followed by all the values for the second and then all of the values for the third (as if they have be read in as columns, not rows). First, open a connection to the data.

newdata = file("http://www.ats.ucla.edu/stat/r/faq/bindata.dat", "rb")

Next, let's read in the variable names and save them to a vector in R.

varnames = readBin(newdata, character(), n=3) varnames[1] "read" "write" "math"

To read in the integer values, we can opt to read all 300 onto one vector, and then separate it out into the three variables.

datavals = readBin(newdata, integer(), size = 4, n = 600, endian = "little") readvals = datavals[1:200] writevals = datavals[201:400] mathvals = datavals[401:600]

Or we can read in each variable's values with a separate readBin command.

readvals = readBin(newdata, integer(), size = 4, n = 200, endian = "little") writevals = readBin(newdata, integer(), size = 4, n = 200, endian = "little") mathvals = readBin(newdata, integer(), size = 4, n = 200, endian = "little")

Then, we can combine our three value vectors into one data frame with the variable names as our column names.

rdata = cbind(readvals, writevals, mathvals) colnames(rdata) = varnames rdata[1:5,]read write math [1,] 57 52 41 [2,] 68 59 53 [3,] 44 33 54 [4,] 63 44 47 [5,] 47 52 57

Lastly, since we have finished reading data from the binary file, we can close the connection.

close(newdata)

If you wish to write a binary file from R, see R FAQ: How can I write a binary data file in R?

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.