[R] readBin into a data frame

Duncan Murdoch murdoch.duncan at gmail.com
Thu Aug 1 13:41:26 CEST 2013


On 13-08-01 4:36 AM, Zhang Weiwu wrote:
> Hello. readBin is designed to read a batch of data with the same spec, e.g.
> read 10000 floats into a vector. In practise I read into data frame, not
> vector.  For each data frame, I need to read a integer and a float.
>
> for (i in 1:1000) {
>   	dataframe$int[i]   <- readBin(con, integer(), size=2)
>   	dataframe$float[i] <- readBin(con, numeric(), size=4)
> }
>
> And I need to read 100 such data files, ending up with a for loop in a for
> loop. Something feels wrong here, as it is being said if you use double-FOR
> you are not speaking R.
>
> What is the R way of doing this? I can think of writing the content of the
> loop into a function, and vectorize it -- But, the result would be a list of
> list, not exactly data-frame, and the list grows incrementally, which is
> inefficient, since I know the size of my data frame at the outset. I am a
> new learner, not speaking half of R vocabulary, kindly provide some hint
> please:)

I don't think there are any functions to do this directly.  I'd probably 
use the loop (since the time to read 1000 entries would be small).  If 
it was longer, what I might do is to read the file as raw bytes, then 
read the integer and float vector from subsets of the bytes.

For example, the following untested code:

rawvec <- readBin(con, "raw")
n <- length(rawvec) / 6
i <- 0:(n-1)
# Using sort here is inefficient, but I'm lazy...
indices <- sort( c(6*i + 1, 6*i + 2) )
con <- rawConnection(rawvec[indices])
int <- readBin(con, "integer", size=2)
close(con)

indices <- sort( c(6*i + 3, 6*i + 4, 6*i + 5, 6*i + 6) )
con <- rawConnection(rawvec[indices])
float <- readBin(con, "numeric", 4)
close(con)

dataframe <- data.frame(int=int, float=float)

The other way to do this is to read the data in a C function, using 
.Call or .C to get it into R.

Duncan Murdoch



More information about the R-help mailing list