[R] Accelerating binRead

Philippe de Rochambeau phiroc at free.fr
Sun Sep 18 09:35:55 CEST 2016


The only difference between the below code and my program is that the former assumes that the file only contains one row of 10 ints + 10 floats , whereas my program doesn’t know in advance how many rows the file contains, unless it downloads it first and computes the potential number of rows based on its size.

> Le 17 sept. 2016 à 20:45, Philippe de Rochambeau <phiroc at free.fr> a écrit :
> 
> Hi Jim,
> this is exactly the answer I was look for. Many thanks. I didn’t R had a pack function, as in PERL.
> To answer your earlier question, I am trying to update legacy code to read a binary file with unknown size, over a network, slice up it into rows each containing an integer, an integer, a long, a short, a float and a float, and stuff the rows into a matrix.
> Best regards,
> Philippe
> 
>> Le 17 sept. 2016 à 20:38, jim holtman <jholtman at gmail.com <mailto:jholtman at gmail.com>> a écrit :
>> 
>> Here is an example of how to do it:
>> 
>> x <- 1:10  # integer values
>> xf <- seq(1.0, 2, by = 0.1)  # floating point
>> 
>> setwd("d:/temp")
>> 
>> # create file to write to
>> output <- file('integer.bin', 'wb')
>> writeBin(x, output)  # write integer
>> writeBin(xf, output)  # write reals
>> close(output)
>> 
>> 
>> library(pack)
>> library(readr)
>> 
>> # read all the data at once
>> allbin <- read_file_raw('integer.bin')
>> 
>> # decode the data into a list
>> (result <- unpack("V V V V V V V V V V d d d d d d d d d d", allbin))
>> 
>> 
>> 
>> 
>> Jim Holtman
>> Data Munger Guru
>>  
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>> 
>> On Sat, Sep 17, 2016 at 11:04 AM, Ismail SEZEN <sezenismail at gmail.com <mailto:sezenismail at gmail.com>> wrote:
>> I noticed same issue but didnt care much :)
>> 
>> On Sat, Sep 17, 2016, 18:01 jim holtman <jholtman at gmail.com <mailto:jholtman at gmail.com>> wrote:
>> Your example was not reproducible.  Also how do you "break" out of the
>> "while" loop?
>> 
>> 
>> Jim Holtman
>> Data Munger Guru
>> 
>> What is the problem that you are trying to solve?
>> Tell me what you want to do, not how you want to do it.
>> 
>> On Sat, Sep 17, 2016 at 8:05 AM, Philippe de Rochambeau <phiroc at free.fr <mailto:phiroc at free.fr>>
>> wrote:
>> 
>> > Hello,
>> > the following function, which stores numeric values extracted from a
>> > binary file, into an R matrix, is very slow, especially when the said file
>> > is several MB in size.
>> > Should I rewrite the function in inline C or in C/C++ using Rcpp? If the
>> > latter case is true, how do you « readBin »  in Rcpp (I’m a total Rcpp
>> > newbie)?
>> > Many thanks.
>> > Best regards,
>> > phiroc
>> >
>> >
>> > -------------
>> >
>> > # inputPath is something like http://myintranet/getData <http://myintranet/getData>?
>> > pathToFile=/usr/lib/xxx/yyy/data.bin <http://myintranet/getData <http://myintranet/getData>?
>> > pathToFile=/usr/lib/xxx/yyy/data.bin>
>> >
>> > PLTreader <- function(inputPath){
>> >         URL <- file(inputPath, "rb")
>> >         PLT <- matrix(nrow=0, ncol=6)
>> >         compteurDePrints = 0
>> >         compteurDeLignes <- 0
>> >         maxiPrints = 5
>> >         displayData <- FALSE
>> >         while (TRUE) {
>> >                 periodIndex <- readBin(URL, integer(), size=4, n=1,
>> > endian="little") # int (4 bytes)
>> >                 eventId <- readBin(URL, integer(), size=4, n=1,
>> > endian="little") # int (4 bytes)
>> >                 dword1 <- readBin(URL, integer(), size=4, signed=FALSE,
>> > n=1, endian="little") # int
>> >                 dword2 <- readBin(URL, integer(), size=4, signed=FALSE,
>> > n=1, endian="little") # int
>> >                 if (dword1 < 0) {
>> >                         dword1 = dword1 + 2^32-1;
>> >                 }
>> >                 eventDate = (dword2*2^32 + dword1)/1000
>> >                 repNum <- readBin(URL, integer(), size=2, n=1,
>> > endian="little") # short (2 bytes)
>> >                 exp <- readBin(URL, numeric(), size=4, n=1,
>> > endian="little") # float (4 bytes, strangely enough, would expect 8)
>> >                 loss <- readBin(URL, numeric(), size=4, n=1,
>> > endian="little") # float (4 bytes)
>> >                 PLT <- rbind(PLT, c(periodIndex, eventId, eventDate,
>> > repNum, exp, loss))
>> >         } # end while
>> >         return(PLT)
>> >         close(URL)
>> > }
>> >
>> > ----------------
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>
>> > PLEASE do read the posting guide http://www.R-project.org/ <http://www.r-project.org/>
>> > posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>> 
>>         [[alternative HTML version deleted]]
>> 
>> ______________________________________________
>> R-help at r-project.org <mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help <https://stat.ethz.ch/mailman/listinfo/r-help>
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html <http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
> 


	[[alternative HTML version deleted]]



More information about the R-help mailing list