[R] Importing binary data

Uwe Ligges ligges at statistik.uni-dortmund.de
Tue Jun 1 13:49:54 CEST 2004


Uli Tuerk wrote:

> Hi everybody!
> 
> I've a large dataset, about 2 Mio entries of the format which I would like 
> to import into a frame:
> <integer><integer><float><string><float><string><string>
> 
> Because to the huge data amount I've choosen a binary format instead 
> of a text format when exporting from Matlab.
> My import function is attached below. It works fine for only some entries 
> but is deadly slow when trying to read the complete set. 
> 
> Does anybody has some pointers for me for improving the import or handling 
> such large data sets? 

Suggestion:

a) Use a database!!!



And only for very strong reasons against a):

b) Rewrite your import code in C.

c) optimize the code below by initializing the objects in full length 
(e.g. imp.v <- numeric(n)) (maybe you can read it from the header or 
derive the size from the size of the file  ....)


Uwe Ligges



> Thanks in advance!
> 
> Uli
> 
> 
> 
> read.DET.data <- function ( f ) {
> 	counter <- 1
> 	spk.v <- c()
> 	imp.v <- c()
> 	score.v <- c()
> 	th.v <- c()
> 	ses.v <- c()
> 	rec.v <- c()
> 	type.v <- c()
> 	fid <- file( f ,"rb")
> 	tempi <- readBin(fid , integer(), size=1, signed=FALSE)
> 	while ( length(tempi) != 0) {
> 		spk.v[ counter ] <- tempi
> 		imp.v[ counter ] <- readBin(fid, integer(), size=1, signed=FALSE)
> 		score.v[ counter  ] <- readBin(fid, numeric(), size=4)
> 		type.v[ counter ] <- readBin(fid, character())
> 		th.v[ counter ] <- readBin(fid, numeric(), size=4)
> 		ses.v[ counter ] <- readBin(fid, character())
> 		rec.v[ counter ] <- readBin(fid, character())
> 		counter <- counter + 1
> 		tempi <- readBin(fid, integer(), size=1, signed=FALSE)
> 	}
> 	close( fid )
> 	spkf <- factor ( spk.v )
> 	impf <- factor ( imp.v )
> 	
> 	det.f <- data.frame( spk=spkf, imp=impf, score=score.v, th=th.v, ses=ses.v, rec=rec.v, type=type.v)
> 
> 	det.f
> }
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list