|
|
|
||||
|
|
|||||
In this example, we show how to read a binary file into R. The original binary file can be downloaded following the link and its codebook can be downloaded here. The zip file here contains data file, the pdf file of the codebook and the Stata code example in case the links above are not available.
The data file contains 3520 bytes of header information in ASCII and here is the beginning part of it.
CCSD3ZF0000100000001CCSD3VS00006PRODUCER Product_File_Name = JA1_IGD_2PcP243_093; Producer_Agency_Name = CNES; Processing_Center = SSALTO; File_Data_Type = IGDR; Reference_Document = SMM-ST-M-EA-10879-CN Issue 4.0; Reference_Software = CMAV9.2_01/G5OS5; Operating_System = SunOS 5.9; Product_Creation_Time = 2008-08-20T13:57:26.000000; CCSD$$MARKERPRODUCERCCSD3KS00006PASSFILE Mission_Name = Jason-1; Altimeter_Sensor_Name = POSEIDON-2; Radiometer_Sensor_Name = JMR; DORIS_Sensor_Name = DORIS-2 GM; Acquisition_Station_Name = JTCCS ; Cycle_Number = 243; Absolute_Revolution_Number = 30781; Pass_Number = 93; Absolute_Pass_Number = 61561; Equator_Time = 2008-08-14T09:54:37.743000; Equator_Longitude = +235.99<deg>; First_Measurement_Time = 2008-08-14T10:00:01.008141; Last_Measurement_Time = 2008-08-14T10:22:43.766073; First_Measurement_Latitude = +15.81<deg>; Last_Measurement_Latitude = +66.15<deg>; First_Measurement_Longitude = +241.82<deg>; Last_Measurement_Longitude = +318.85<deg>; Pass_Data_Count = 765; Ocean_Pass_Data_Count = 483; Ocean_PCD = 0<%>; Time_Epoch = 1958-01-01T00:00:00.000000;It includes the information on the operating system used, the number of observations and time the first measurement is taken. The SunOS operating system indicates that our endian will be "big". Since the beginning of this file contains the information above (a total of 3520 bytes), we will begin reading in the actual data (variable values) after the header. To indicate this to R, we use the seek command. Within our seek command, we indicate that we will want to read this data with rw = "r".
t<- file("C:/test1.dat", "rb")
seek(t, where = 3520, rw="r")
There are many variables in the data set. In this example, we only show how to read the first eight variables. We know from the header information that there are 765 observations in the file. Here is the code for reading the data. We will first create a matrix that will hold our dataset. Then we will look, observation by observation, at t, reading in the first 8 variables one at a time with readBin and then skipping to the beginning of the next line.
mymat<-matrix(0, nrow=765, ncol=8)
for (i in 1:765) {
# time tag
# time_day
mymat[i,1]<- readBin(t, integer(), size = 4, n = 1, signed = FALSE,
endian = "big")
# time_sec
mymat[i,2]<- readBin(t, integer(), size = 4, n = 1, signed = FALSE,
endian = "big")
# time_ms
mymat[i,3]<- readBin(t, integer(), size = 4, n = 1, signed = FALSE,
endian = "big")
#location and surface type
# latitude
mymat[i,4]<- readBin(t, integer(), size = 4, n = 1, signed = TRUE,
endian = "big")
# longitude
mymat[i,5]<- readBin(t, integer(), size = 4, n = 1, signed = FALSE,
endian = "big")
# surface_type
mymat[i,6] <- readBin(t, integer(), size = 1,n = 1, signed = FALSE,
endian = "big")
# alt_echo_type
mymat[i,7]<- readBin(t, logical(), size = 1, n = 1, signed = FALSE,
endian = "big")
# rad_surf_type
mymat[i,8] <- readBin(t, logical(), size = 1,n = 1, signed = FALSE,
endian = "big")
a<-440*i + 3520
seek(t, where = a, rw="r")
}
Where is that 440 coming from? We know from looking at the file properties that it has 340,120 bytes. We also know that the header contains 3,520 of those. That leaves us with (340,120 - 3,520) = 336,600 bytes. From the header, we know we have 765 lines of observations, so the number of bytes per observation is 336,600/765 = 440. Thus, we are assigning a to be the first byte in the next line after reading in the 8 variables of interest from the given line.
Now that we have read in our variables of interest, we can close our file t and look at the data matrix.
close(t)
mymat[1:5,]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 18488 36001 8141 15806591 241819737 0 1 0
[2,] 18488 36002 27717 15856153 241839314 0 1 0
[3,] 18488 36003 47293 15905712 241858902 0 1 0
[4,] 18488 36004 66870 15955268 241878502 0 1 0
[5,] 18488 36005 86444 16004821 241898113 0 1 0
We might wish to convert the first column in our matrix to a more readable date form. Knowing that the origin for these dates is January 1, 1958 (from the Time_Epoch listed in the header), we can make the conversion.
var1.d<- as.Date(mymat[,1], origin="1958-01-01") var1.d[1:5] [1] "2008-08-14" "2008-08-14" "2008-08-14" "2008-08-14" "2008-08-14"
UCLA Researchers are invited to our Statistical Consulting Services
We recommend others to our list of Other Resources for Statistical Computing Help
These pages are Copyrighted (c) by UCLA Academic Technology Services