[BioC] matrix transformation

Mon Oct 18 12:10:16 CEST 2010

This question has no direct relationship to Bioconductor.  You should
look at the archives of the SIG-DB
mailing list for R

https://stat.ethz.ch/mailman/listinfo/r-sig-db

and perhaps pose the question there if it has not been previously handled.

DBI's dbWriteTable is surely a relevant function to understand.
Various Bioconductor facilities use
DBI to create and manage large tables in SQLite -- AnnotationDbi and
Genominator are two packages that come to mind.
Other schemes for handling large data resources alongside R are
described in the CRAN task view for high performance computing.

On Mon, Oct 18, 2010 at 5:41 AM, Bucher Elmar <Ext-Elmar.Bucher at vtt.fi> wrote:
> Dear Mailing List,
>
> I wrote the following function "matrix2tuple.sf" to translate a "cartesian xy matrix" as a "tuple matrix", to store it in relational database.
> The code works fine for a test set. My problem is, my real matrix is 7000 x 10000 big, which ends up in 70'000'0000' tuples.
> Transformation takes days X(...
> Has anyone an idea, how I can optimize the described functions for speed?
>
> Best Wishes, Elmar Bucher
>
>
>
> ##### BEGIN CODE LISTING ####
>
> matrix2tuple.sf <- function(xy.matrix.m = NULL) {
>    tuple.m <- NULL
>    #x.length.v <- dim(xy.matrix.m)[2]
>    #y.length.v <- dim(xy.matrix.m)[1]
>    #for (x.v in (1:x.length.v)) {
>    #for (y.v in (1:y.length.v)) {
>    x.axis.v <- colnames(xy.matrix.m)
>    y.axis.v <- rownames(xy.matrix.m)
>    for (x.v in x.axis.v) {
>        for (y.v in y.axis.v) {
>            #cat(x.v, y.v, xy.matrix.m[y.v,x.v],"\n")
>            tuple.v <- c(x.v, y.v, xy.matrix.m[y.v,x.v])
>            if (is.null(tuple.v)) {
>                tuple.m <- tuple.v
>            } else {
>                tuple.m <- rbind(tuple.m, tuple.v)
>            }
>        }
>    }
>    return(tuple.m)
> }
>
>
> put.db.sf <- function(conn.s4=NULL, x.v=NULL, y.v=NULL,  xy.v=NULL) {
>    query.v <- paste("INSERT INTO matrixdbtb ('xaxis','yaxis','xy') VALUES('",x.v,"','",y.v,"','",xy.v,"');", sep ="")
>    #catch.df <- dbGetQuery(conn.s4, query.v)
>    cat(query.v, "\n")
> }
>
> ## main ##
> matrix.m <- c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15","16")
> dim(matrix.m) <- c(4,4)
> colnames(matrix.m) <- c("A","B","C","D")
> rownames(matrix.m) <- c("a","b","c","d")
> tuple.m <- matrix2tuple.sf (matrix.m)
> for (i in 1: dim(tuple.m)[1]) {
>    put.db.sf(x.v=tuple.m[i,1],y.v=tuple.m[i,2],xy.v=tuple.m[i,3])
> }
>
> #### END CODE LISTING ########
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>