[R] how to work with big matrices and the ff-package?

Jens Oehlschlägel jens.oehlschlaegel at truecluster.com
Thu Apr 15 23:26:40 CEST 2010


Anne,

 

> After the above step I need to convert my ff_matrix to a data.frame to discretize the whole matrix and calculate the mutual information.

> The calculated result should be saved as an ffdf-object or something similar.
> disc <- as.ffdf(discretize(as.data.frame(as.ffdf(ffmat)), disc="equalwidth", nbins=5))

 

ffdf are ff's aquivalent to data.frames: they handle many rows (2^31-1) and a limited number of columns (with potentially different
column types). Like data.frames, they are not suitable for millions of columns. You probably want to store your data in one big ff matrix.



If you use ff objects because you don't have the RAM for standard R objects, converting ff to a data.frame is not an option because it will require too much RAM.

If 'discretize' expects a data.frame, you cannot call it on an ff matrix either. But if 'discretize' works on single columns, you can call discretize on chunks of columns that you coerce to data.frames.

 

something like

for (i in chunk(from=1, to=ncol(ffmat), by=10))

ffmat[,i] <- as.matrix(discretize(as.data.frame(ffmat[,i])))

 

If discretize returns integers, you might want to write the results rather to an integer ff matrix because this saves disk space and improves caching.

 

HTH

Jens Oehlschlägel



More information about the R-help mailing list