[Rd] "Default" accessor in S4 classes

Simon Urbanek simon.urbanek at r-project.org
Tue Jan 8 01:50:59 CET 2013


Chris,

On Jan 7, 2013, at 6:23 PM, Chris Jewell wrote:

> Hi All,
> 
> I'm currently trying to write an S4 class that mimics a data.frame, but stores data on disc in HDF5 format.  The idea is that the dataset is likely to be too large to fit into a standard desktop machine, and by using subscripts, the user may load bits of the dataset at a time.  eg:
> 
>> myLargeData <- LargeData("/path/to/file")
>> mySubSet <- myLargeData[1:10, seq(1,15,by=3)]
> 
> I've therefore defined by LargeData class thus
> 
>> LargeData <- setClass("LargeData", representation(filename="character"))
>> setMethod("initialize","LargeData", function(.Object,filename) .Object at filename <- filename)
> 
> I've then defined the "[" method to call a C++ function (Rcpp), opening the HDF5 file, and returning the required rows/cols as a data.frame.
> 
> However, what if the user wants to load the entire dataset into memory?  Which method do I overload to achieve the following?
> 
>> fullData <- myLargeData
>> class(fullData)
> [1] "data.frame"
> 

That makes no sense since a <- b is not a transformation, "a" will have the same value as "b" by definition - and thus the same class. If you really meant

fullData <- as.data.frame(myLargerData)

then you just need to implement the as.data.frame() method for your class.

Note, however, that a more common way to convert between a big data reference and native format in its entirety is simply myLargeData[] -- you may want to have a look at the (many) existing big data packages (AFAIR bigmemory uses C++ back-end as well). Also note that indexing is tricky in R and easy to get wrong (remember: negative indices, index by name etc.)


> or apply transformations:
> 
>> myEigen <- eigen(myLargeData)
> 
> In C++ I would normally overload the "double" or "float" operator to achieve this -- can I do the same thing in R?
> 

Again, there is no implicit coercion in R (you cannot declare variable type in advance) so it doesn't make sense in the context you have in mind from C++ -- in R the equivalent is simply implementing as.double() method, but I suspect that's not what you had in mind. For generics you can simply implement a method for your class (that does the coercion, for example, or uses a more efficient way). If you cannot define a generic or don't want to write your own methods then it's a problem, because the only theoretical way is to subclass numeric vector class, but that is not possible in R if you want to change the representation because it falls through to the more efficient internal code too quickly (without extra dispatch) for you.

Cheers.
Simon


> Thanks,
> 
> Chris
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 



More information about the R-devel mailing list