UCLA Academic Technology Services HomeServicesClassesContactJobs

R Code Fragments
How can I read binary data in R?

In this example, we show how to read a binary file into R. The original binary file can be downloaded following the link and its codebook can be downloaded here. The zip file here contains data file, the pdf file of the codebook and the Stata code example in case the links above are not available. 

The data file contains 3520 bytes of header information in ASCII and here is the beginning part of it.

CCSD3ZF0000100000001CCSD3VS00006PRODUCER
Product_File_Name =                      JA1_IGD_2PcP243_093;
Producer_Agency_Name = CNES;
Processing_Center = SSALTO;
File_Data_Type = IGDR;
Reference_Document =                     SMM-ST-M-EA-10879-CN Issue 4.0;
Reference_Software =     CMAV9.2_01/G5OS5;
Operating_System =            SunOS 5.9;
Product_Creation_Time = 2008-08-20T13:57:26.000000;
CCSD$$MARKERPRODUCERCCSD3KS00006PASSFILE
Mission_Name = Jason-1;
Altimeter_Sensor_Name = POSEIDON-2;
Radiometer_Sensor_Name = JMR;
DORIS_Sensor_Name = DORIS-2 GM;
Acquisition_Station_Name =      JTCCS          ;
Cycle_Number =   243;
Absolute_Revolution_Number = 30781;
Pass_Number =  93;
Absolute_Pass_Number = 61561;
Equator_Time = 2008-08-14T09:54:37.743000;
Equator_Longitude = +235.99<deg>;
First_Measurement_Time = 2008-08-14T10:00:01.008141;
Last_Measurement_Time = 2008-08-14T10:22:43.766073;
First_Measurement_Latitude = +15.81<deg>;
Last_Measurement_Latitude = +66.15<deg>;
First_Measurement_Longitude = +241.82<deg>;
Last_Measurement_Longitude = +318.85<deg>;
Pass_Data_Count =   765;
Ocean_Pass_Data_Count =   483;
Ocean_PCD =   0<%>;
Time_Epoch = 1958-01-01T00:00:00.000000;
It includes the information on the operating system used, the number of observations and time the first measurement is taken. The SunOS operating system indicates that our endian will be "big".  Since the beginning of this file contains the information above (a total of 3520 bytes), we will begin reading in the actual data (variable values) after the header.  To indicate this to R, we use the seek command.  Within our seek command, we indicate that we will want to read this data with rw = "r".
t<- file("C:/test1.dat", "rb")
seek(t, where = 3520, rw="r")

There are many variables in the data set.  In this example, we only show how to read the first eight variables. We know from the header information that there are 765 observations in the file. Here is the code for reading the data. We will first create a matrix that will hold our dataset.  Then we will look, observation by observation, at t, reading in the first 8 variables one at a time with readBin and then skipping to the beginning of the next line.

mymat<-matrix(0, nrow=765, ncol=8)
for (i in 1:765) {

# time tag

# time_day

mymat[i,1]<- readBin(t, integer(), size = 4, n = 1, signed = FALSE,
endian = "big")

# time_sec

mymat[i,2]<- readBin(t, integer(), size = 4, n = 1, signed = FALSE,
endian = "big")

# time_ms

mymat[i,3]<- readBin(t, integer(), size = 4, n = 1, signed = FALSE,
endian = "big")

#location and surface type

# latitude

mymat[i,4]<- readBin(t, integer(), size = 4, n = 1,  signed = TRUE,
endian = "big")

# longitude

mymat[i,5]<- readBin(t, integer(), size = 4, n = 1,  signed = FALSE,
endian = "big")

# surface_type

mymat[i,6] <- readBin(t, integer(), size = 1,n = 1, signed = FALSE,
endian = "big")

# alt_echo_type

mymat[i,7]<- readBin(t, logical(), size = 1, n = 1,  signed = FALSE,
endian = "big")

# rad_surf_type

mymat[i,8] <- readBin(t, logical(), size = 1,n = 1, signed = FALSE,
endian = "big")

a<-440*i + 3520

seek(t, where = a, rw="r")

}

Where is that 440 coming from?  We know from looking at the file properties that it has 340,120 bytes.  We also know that the header contains 3,520 of those.  That leaves us with (340,120 - 3,520) = 336,600 bytes.  From the header, we know we have 765 lines of observations, so the number of bytes per observation is 336,600/765 = 440.  Thus, we are assigning a to be the first byte in the next line after reading in the 8 variables of interest from the given line.

Now that we have read in our variables of interest, we can close our file t and look at the data matrix. 

close(t)

mymat[1:5,]

      [,1]  [,2]  [,3]     [,4]      [,5] [,6] [,7] [,8]
[1,] 18488 36001  8141 15806591 241819737    0    1    0
[2,] 18488 36002 27717 15856153 241839314    0    1    0
[3,] 18488 36003 47293 15905712 241858902    0    1    0
[4,] 18488 36004 66870 15955268 241878502    0    1    0
[5,] 18488 36005 86444 16004821 241898113    0    1    0

We might wish to convert the first column in our matrix to a more readable date form. Knowing that the origin for these dates is January 1, 1958 (from the Time_Epoch listed in the header), we can make the conversion. 

var1.d<- as.Date(mymat[,1], origin="1958-01-01")
var1.d[1:5]

[1] "2008-08-14" "2008-08-14" "2008-08-14" "2008-08-14" "2008-08-14"

How to cite this page

Report an error on this page

UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services


The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.