[R] bigmemory - extracting submatrix from big.matrix object

Wed Jun 3 09:16:02 CEST 2009

Thanks for the really valuable inputs, developing the package and 
updating it regularly. I will be glad if I can contribute in any way.

In problem three, however, I am interested in knowing a generic way to 
apply any function on columns of a big.matrix object (obviously without 
loading the data into R). May be the source code of the function 
"colmean" can help, if that is not too much to ask for. Or if we can 
develop a function similar to "apply" of the base R.

Regards
Utkarsh

Jay Emerson wrote:
> We also have ColCountNA(), which is not currently exposed to the user
> but will be in the next version.
>
> Jay
>
> On Tue, Jun 2, 2009 at 2:08 PM, Jay Emerson <jayemerson at gmail.com> wrote:
>   
>> Thanks for trying this out.
>>
>> Problem 1.  We'll check this.  Options should certainly be available.  Thanks!
>>
>> Problem 2. Fascinating.  We just (yesterday) implemented a
>> sub.big.matrix() function doing exactly
>> this, creating something that is a big matrix but which just
>> references a contiguous subset of the
>> original matrix.  This will be available in an upcoming version
>> (hopefully in the next week).  A more
>> specialized function would create an entirely new big.matrix from a
>> subset of a first big.matrix,
>> making an actual copy, but this is something else altogether. You
>> could do this entirely within R
>> without much work, by the way, and only 2* memory overhead.
>>
>> Problem 3. You can count missing values using mwhich().  For other
>> exploration (e.g. skewness)
>> at the moment you should just extract a single column (variable)  at a
>> time into R, study it, then get the
>> next column, etc... .  We will not be implementing all of R's
>> functions directly with big.matrix objects.
>> We will be creating a new package "bigmemoryAnalytics" and would
>> welcome contributions to the
>> package.
>>
>> Feel free to email us directly with bugs, questions, etc...
>>
>> Cheers,
>>
>> Jay
>>
>>
>> ----------------------------------------------------------
>>
>> From: utkarshsinghal <utkarsh.singhal at global-analytics.com>
>> Date: Tue, Jun 2, 2009 at 8:25 AM
>> Subject: [R] bigmemory - extracting submatrix from big.matrix object
>> To: r help <r-help at r-project.org>
>> I am using the library(bigmemory) to handle large datasets, say 1 GB,
>> and facing following problems. Any hints from anybody can be helpful.
>> _Problem-1:
>> _
>> I am using "read.big.matrix" function  to create a filebacked big
>> matrix of my data and get the following warning:
>>     
>>> x = read.big.matrix("/home/utkarsh.s/data.csv",header=T,type="double",shared=T,backingfile = "backup", backingpath = "/home/utkarsh.s")
>>>       
>> Warning message:
>> In filebacked.big.matrix(nrow = numRows, ncol = numCols, type = type,  :
>>  A descriptor file has not been specified.  A descriptor named
>> backup.desc will be created.
>> However there is no such argument in "read.big.matrix". Although there
>> is an argument "descriptorfile" in the function "as.big.matrix" but if
>> I try to use it in "read.big.matrix", I get an error showing it as
>> unused argument (as expected).
>> _Problem-2:_
>> I want to get a filebacked *sub*matrix of "x", say only selected
>> columns: x[, 1:100]. Is there any way of doing that without actually
>> loading the data into R memory.
>> _
>> Problem-3
>> _There are functions available like:  summary, colmean, colsd, ... for
>> standard summary statistics. But is there any way to calculate other
>> summaries say number of missing values or skewness of each variable,
>> without loading the whole data into R memory.
>> Regards
>> Utkarsh
>>
>> --
>> John W. Emerson (Jay)
>> Assistant Professor of Statistics
>> Department of Statistics
>> Yale University
>> http://www.stat.yale.edu/~jay
>>
>>     
>
>
>
>