[R] bigmemory - extracting submatrix from big.matrix object

Jay Emerson jayemerson at gmail.com
Tue Jun 2 20:08:44 CEST 2009


Thanks for trying this out.

Problem 1.  We'll check this.  Options should certainly be available.  Thanks!

Problem 2. Fascinating.  We just (yesterday) implemented a
sub.big.matrix() function doing exactly
this, creating something that is a big matrix but which just
references a contiguous subset of the
original matrix.  This will be available in an upcoming version
(hopefully in the next week).  A more
specialized function would create an entirely new big.matrix from a
subset of a first big.matrix,
making an actual copy, but this is something else altogether. You
could do this entirely within R
without much work, by the way, and only 2* memory overhead.

Problem 3. You can count missing values using mwhich().  For other
exploration (e.g. skewness)
at the moment you should just extract a single column (variable)  at a
time into R, study it, then get the
next column, etc... .  We will not be implementing all of R's
functions directly with big.matrix objects.
We will be creating a new package "bigmemoryAnalytics" and would
welcome contributions to the
package.

Feel free to email us directly with bugs, questions, etc...

Cheers,

Jay


----------------------------------------------------------

From: utkarshsinghal <utkarsh.singhal at global-analytics.com>
Date: Tue, Jun 2, 2009 at 8:25 AM
Subject: [R] bigmemory - extracting submatrix from big.matrix object
To: r help <r-help at r-project.org>
I am using the library(bigmemory) to handle large datasets, say 1 GB,
and facing following problems. Any hints from anybody can be helpful.
_Problem-1:
_
I am using "read.big.matrix" function  to create a filebacked big
matrix of my data and get the following warning:
> x = read.big.matrix("/home/utkarsh.s/data.csv",header=T,type="double",shared=T,backingfile = "backup", backingpath = "/home/utkarsh.s")
Warning message:
In filebacked.big.matrix(nrow = numRows, ncol = numCols, type = type,  :
 A descriptor file has not been specified.  A descriptor named
backup.desc will be created.
However there is no such argument in "read.big.matrix". Although there
is an argument "descriptorfile" in the function "as.big.matrix" but if
I try to use it in "read.big.matrix", I get an error showing it as
unused argument (as expected).
_Problem-2:_
I want to get a filebacked *sub*matrix of "x", say only selected
columns: x[, 1:100]. Is there any way of doing that without actually
loading the data into R memory.
_
Problem-3
_There are functions available like:  summary, colmean, colsd, ... for
standard summary statistics. But is there any way to calculate other
summaries say number of missing values or skewness of each variable,
without loading the whole data into R memory.
Regards
Utkarsh

--
John W. Emerson (Jay)
Assistant Professor of Statistics
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay




More information about the R-help mailing list