[R] cbind alternate

Rui Barradas ruipbarradas at sapo.pt
Fri Jan 6 19:57:04 CET 2012


Hello,

I believe this function can handle a problem of that size, or bigger.

It does NOT create the full matrix, just writes it to a file, a certain
number of lines at a time.


write.big.matrix <- function(x, y, outfile, nmax=1000){

	if(file.exists(outfile)) unlink(outfile)
	testf <- file(outfile, "at")   # or "wt" - "write text"
	on.exit(close(testf))

	step <- nmax                         # how many at a time
	inx  <- seq(1, length(x), by=step)   # index into 'x' and 'y'
	mat  <- matrix(0, nrow=step, ncol=2) # create a work matrix

	# do it 'nmax' rows per iteration
	for(i in inx){
		mat <- cbind(x[i:(i+step-1)], y[i:(i+step-1)])
		write.table(mat, file=testf, quote=FALSE, row.names=FALSE,
col.names=FALSE)
	}

	# and now the remainder
	mat <- NULL
	mat <- cbind(x[(i+1):length(x)], y[(i+1):length(y)])
	write.table(mat, file=testf, quote=FALSE, row.names=FALSE, col.names=FALSE)

	# return the output filename
	outfile
}

x <- 1:1e6                              # a numeric vector
y <- sample(letters, 1e6, replace=TRUE) # and a character vector
length(x);length(y)                     # of the same length
fl <- "test.txt"                        # output file

system.time(write.big.matrix(x, y, outfile=fl))


On my system it takes (sample output)

   user  system elapsed 
   1.59    0.04    1.65

and can handle different types of data. In the example, numeric and
character.

If you also need the matrix, try to use 'cbind' first, without writing to a
file.
If it's still slow, adapt the code above to keep inserting chunks in an
output matrix.

Rui Barradas




--
View this message in context: http://r.789695.n4.nabble.com/cbind-alternate-tp4270188p4270444.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list