[R] readBin is much slower for raw input than for a file

jim holtman jholtman at gmail.com
Wed Jan 31 14:12:09 CET 2007


I think your problem is subsetting the raw vector: you are deleteing
from the head -- a lot of copying going on.  Instead just subset and
extract the vector length of interest:

> # within a custom read function:
> if (loc == 0)
>    data <- readBin(bytes, what, n, size, ...)
> else if (loc > 0)
>    data <- readBin(bytes[(1:size) + loc, what, n, size, ...)



On 1/31/07, Jon Clayden <jon.clayden at gmail.com> wrote:
> This hasn't generated any feedback after a few days on R-devel, so I'm
> forwarding it to R-help in case anyone here has any ideas...
>
> Thanks,
> Jon
>
> ---------- Forwarded message ----------
> From: Jon Clayden <jon.clayden at gmail.com>
> Date: 26-Jan-2007 11:25
> Subject: readBin is much slower for raw input than for a file
> To: r-devel at r-project.org
>
>
> Dear all,
>
> I'm trying to write an efficient binary file reader for a file type
> that is made up of several fields of variable length, and so requires
> many small reads. Doing this on the file directly using a sequence of
> readBin() calls is a bit too slow for my needs, so I tried buffering
> the file into a raw vector and reading from that ("loc" is the
> equivalent of the file pointer):
>
> fileSize <- file.info(fileName)$size
> connection <- file(fileName, "rb")
> bytes <- readBin(connection, "raw", n=fileSize)
> loc <- 0
> close(connection)
>
> --
>
> # within a custom read function:
> if (loc == 0)
>    data <- readBin(bytes, what, n, size, ...)
> else if (loc > 0)
>    data <- readBin(bytes[-(1:loc)], what, n, size, ...)
>
> However, this method runs almost 10 times slower for me than the
> sequence of file reads did. The initial call to readBin() - for
> reading in the file - is very quick, but running Rprof shows that the
> vast majority of the run time in doing the full parse is spent in
> readBin, so it does seem to be that that's slowing things down. Can
> anyone shed any light on why this is?
>
> I'm not expecting miracles here - and I realise that writing the whole
> read routine in C would be much quicker - but surely reading from a
> raw vector should work out faster than reading from a file? The system
> is R-2.4.1/Linux, Xeon 3.2 GHz, 2 GiB RAM; typical file size is 44
> KiB.
>
> Thanks in advance,
> Jon
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list