[R] data storage/cubes and pointers in R

Martin Morgan mtmorgan at fhcrc.org
Thu Nov 9 23:26:24 CET 2006


In case the other replies aren't to your liking, and you want to write
something yourself...

Piet van Remortel <piet.vanremortel at gmail.com> writes:

[snip]

> Also considering implementing a similar setup myself, I started  
> wondering about the possibility of use references (or "pointers"  
> aargh) to dataframes and store them in a list etc.   Separate lists

My own experimentation with this is to create an S4 'View' class that
indexes / subsets / accesses small parts of the 'big' data, with the
actual data treated essentially as 'read-only' or otherwise abstracted
out of memory. Something along the lines of

setClass("ViewSet",
         representation=representation(
           data="environment", # environments are reference-like
           idx="list" # 1 element per dimension, or something more clever
           ))

setMethod("initialize",
          signature(.Object="ViewSet"),
          function(.Object, ...) {
              env <- new.env()
              ## get the big data: arguments to "new" / SQL query / ???
              ## assign big data to env (e.g., see below) then
              .Object at env <- env
              ## set up idx
              ## ...
              .Object
          })

setMethod("[",
          signature(x="ViewSet"),
          function (x, i, j, ..., drop = TRUE) {
              ## adjust x at idx, maybe querying x at data for help
          })

setReplaceMethod("[",
                 signature(x="ViewSet"),
                 function (x, i, j, ..., value) 
                 ## adjust x at idx[i, j, ...
                 ## return x, i.e., a ViewSet -- bigData not changed / copied
          })

> can then represent different 'views' on the shared instance  
> dataframes etc.   I have no knowledge if that is even possible in R,  
> and if that is even the smart way to do it.  If someone could provide  
> some help, that would be great.
>
> Other option is of course to link to MySQL and do all data handling  
> in that way.  Also considering that.

or do both, i.e., write ViewSqlSet to 'contain' ViewSet, etc.

> Any thoughts/hints would be appreciated !

Probably you could implement the same ideas in the less intimidating
S3 way, using e.g., a list with

makeView <- function(data) {
    ## e.g., 'data' a named list of commonly-sized elements, in or out
    ## of memory -- details depend on needs
    env <- new.env()
    for (elt in names(data)) env[[elt]] <- data[[elt]]
    ## initialize index
    idx <- list(rows=1:nrow(data[[1]]), cols=1:ncol(data[[1]]))
    lst <- list(env=env, idx=idx)
    class(lst) <- "View"
    lst
}

"[.View" <- function (x, i, j, ..., drop = TRUE) {
    ## x will be like lst from above, use i, j, etc to subset
    ## adjust and then return idx, e.g.,...
    x$idx$rows <- x$idx$rows[i]
    x
}

getData <- function(x, ...) UseMethod("getData")

getData.View <- function(x, ...) {
    ## return list of subsetted elements
    res <- with(x,
                lapply(ls(env), function(elt) env[[elt]][idx$rows, idx$cols]))
    names(res) <- ls(x$env)
    res
}

and then...

> bigView <- makeView(list(df=data.frame(x=1:100, y=100:1),
+     m=matrix(1:200, ncol=2)))
> smallView <- bigView[1:5,]
> getData(smallView) ## copies, but only the 'small' data
$df
  x   y
5 5  96
4 4  97
3 3  98
2 2  99
1 1 100

$m
     [,1] [,2]
[1,]    5  105
[2,]    4  104
[3,]    3  103
[4,]    2  102
[5,]    1  101

Obviously a hack, but perhaps it gets you going...

> thanks,
>
> Piet
>
>
>
> --
> Dr. P. van Remortel
> Intelligent Systems Lab
> Dept. of Mathematics and Computer Science
> University of Antwerp
> Belgium
> http://www.islab.ua.ac.be
> +32 3 265 33 57 (secr.)
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Martin T. Morgan
Bioconductor / Computational Biology
http://bioconductor.org



More information about the R-help mailing list