[Rd] idea for "virtual matrix/array" class

Tony Plate tplate at blackmesacapital.com
Mon Aug 23 21:16:18 CEST 2004

I've been wondering how to work with more data than can fit in memory, in a 
way that allows it to be worked with conveniently and quickly.  Of course, 
a database can be used for this purpose, but extracting data from a 
database is much slower and somewhat less convenient than extracting data 
from a native object (at least in our setup).

One idea I was thinking about was to have a new class of object that 
referred to data in a file on disk, and which had all the standard methods 
of matrices and arrays, i.e., subsetting ("["), dim, dimnames, etc.  The 
object in memory would only store the array attributes, while the actual 
array data (the elements) would reside in a file.  When some extraction 
method was called, it would access data in the file and return the 
appropriate data.  With sensible use of seek operations, the data access 
could probably be quite fast.  The file format of the object on disk could 
possibly be the standard serialized binary format as used in .RData 
files.  Of course, if the object was larger than would fit in memory, then 
trying to extract too large a subarray would exhaust memory, but it should 
be possible to efficiently extract reasonably sized subarrays.  To be more 
useful, one would want want apply() to work with such arrays.  That would 
be doable, either by creating a new method for apply, or possibly just for 

Some difficulties that might arise could have to do with functions like 
"typeof" and "mode" and "storage.mode" -- what should these return for such 
an object?  Such a class would probably break the common relationships such 
as x[1] having the same storage mode as x.  I don't know if difficulties 
like these would ultimately make such a "virtual array" class unworkable.

Does anyone have any opinions as to the merits of this idea?  Would there 
be any interest in seeing such a class in R?

-- Tony Plate

More information about the R-devel mailing list