[R] Cube of Matrices or list of Matrices

Jeff Newmiller jdnewmil at dcn.davis.CA.us
Tue Jan 20 04:13:31 CET 2015


I use plyr and am learning dplyr and magrittr, but those are just syntactic sugar. What I have been having difficulty with in this thread is the idea that it somehow makes sense to pad vectors with NA values... because I really don't think it does. It seems more like a hammer looking for a nail because that is what it knows how to deal with.

You have a list of matrices with data in them, and switching from for loops to lapply is not in itself going to fix a memory or speed problem... normally the big improvements are in the way you allocate and use your data. Burns talks about pre-allocating the result to speed things up, but I don't understand the problem well enough to suggest an efficient data structure to pre-allocate.

I suggest that Karim read and adhere to the Posting Guide (particularly the bits about giving a reproducible example and posting in plain text so it doesn't get scrambled) if help with optimizing is desired. The discussion at [1] might clarify what "reproducible" means.

I will also mention that efficient algorithms for this subject area are frequently available in the Bioconductor project, so I hope you are not re-inventing the wheel and have already reviewed their tools.

[1] http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On January 19, 2015 6:11:38 PM PST, Ben Tupper <btupper at bigelow.org> wrote:
>Hi,
>
>On Jan 19, 2015, at 5:17 PM, Karim Mezhoud <kmezhoud at gmail.com> wrote:
>
>> Thanks Ben.
>> I need to learn more about apply. Have you a link or tutorial about
>apply. R documentation is very short.
>> 
>> How can obtain:
>> z <- list (Col1, Col2, Col3, Col4......)?
>> 
>
>This may not be the most efficient way and there certainly is no error
>checking, but you can wrap one lapply within another as shown below. 
>The innermost iterates over your list of input matrices, extracting one
>column specified per list element.  The outer lapply iterates over the
>various column numbers you want to extract.
>
>
>getMatrices <- function(colNums, dataList = x){
>   # the number of rows required
>   n <- max(sapply(dataList, nrow))
>lapply(colNums, function(x, dat, n) { # iterate along requested columns
>do.call(cbind, lapply(dat, getColumn,x, len=n)) # iterate along input
>data list
>   }, dataList, n)
>}
>   
>getMatrices(c(1,3), dataList = x)  
>
>If we are lucky, one of the plyr package users might show us how to do
>the same with a one-liner. 
>
>
>There are endless resources online, here are some gems. 
> 
>http://www.r-project.org/doc/bib/R-books.html 
>http://www.rseek.org/
>http://www.burns-stat.com/documents/
>http://www.r-bloggers.com/
>
>Also, I found "Data Manipulation with R" (
>http://www.r-project.org/doc/bib/R-books_bib.html#R:Spector:2008 )
>helpful.  
>
>Ben
>
>> Thanks
>> 
>>   Ô__
>>  c/ /'_;~~~~kmezhoud
>> (*) \(*)   ⴽⴰⵔⵉⵎ  ⵎⴻⵣⵀⵓⴷ
>> http://bioinformatics.tn/
>> 
>> 
>> 
>> On Mon, Jan 19, 2015 at 8:22 PM, Ben Tupper <btupper at bigelow.org>
>wrote:
>> Hi again,
>> 
>> On Jan 19, 2015, at 1:53 PM, Karim Mezhoud <kmezhoud at gmail.com>
>wrote:
>> 
>>> Yes Many thanks.
>>> That is my request using lapply.
>>> 
>>> do.call(cbind,col1)
>>> 
>>>  converts col1 to matrix but does not fill empty value with NA.
>>> 
>>> Even for
>>> 
>>> matrix(unlist(col1), ncol=5,byrow = FALSE)
>>> 
>>> 
>>> How can get Matrix class of col1? And fill empty values with NA?
>>> 
>> 
>> Perhaps best is to determine the maximum number of rows required
>first, then force each subset to have that length.
>> 
>> # make a list of matrices, each with nCol columns and differing
>> # number of rows
>> nCol <- 3
>> nRow <- sample(3:10, 5)
>> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol =
>nc, nrow = x)}, nCol)
>> x
>> 
>> # make a simple function to get a single column from a matrix
>> getColumn <- function(x, colNum, len = nrow(x)) {
>>    y <- x[,colNum]
>>    length(y) <- len
>>    y
>> }
>> 
>> # what is the maximum number of rows
>> n <- max(sapply(x, nrow))
>> 
>> # use the function to get the column from each matrix
>> col1 <- lapply(x, getColumn, 1, len = n)
>> col1
>> 
>> do.call(cbind, col1)
>>       [,1] [,2] [,3] [,4] [,5]
>>  [1,]    3    8    5    7    9
>>  [2,]    4    9    6    8   10
>>  [3,]    5   10    7    9   11
>>  [4,]   NA   11    8   10   12
>>  [5,]   NA   12    9   11   13
>>  [6,]   NA   13   NA   12   14
>>  [7,]   NA   14   NA   13   15
>>  [8,]   NA   15   NA   NA   16
>>  [9,]   NA   NA   NA   NA   17
>> 
>> Ben
>> 
>>> Thanks
>>> Karim
>>> 
>>> 
>>>   Ô__
>>>  c/ /'_;~~~~kmezhoud
>>> (*) \(*)   ⴽⴰⵔⵉⵎ  ⵎⴻⵣⵀⵓⴷ
>>> http://bioinformatics.tn/
>>> 
>>> 
>>> 
>>> On Mon, Jan 19, 2015 at 4:36 PM, Ben Tupper <ben.bighair at gmail.com>
>wrote:
>>> Hi,
>>> 
>>> On Jan 18, 2015, at 4:36 PM, Karim Mezhoud <kmezhoud at gmail.com>
>wrote:
>>> 
>>> > Dear All,
>>> > I am trying to get correlation between  Diseases (80) in columns
>and
>>> > samples in rows (UNEQUAL) using gene expression (at less
>1000,numeric). For
>>> > this I can use CORREP package with cor.unbalanced function.
>>> >
>>> > But before to get this final matrix I need to load and to store
>the
>>> > expression of 1000 genes for every Disease (80). Every disease has
>>> > different number of samples (between 50 - 500).
>>> >
>>> > It is possible to get a cube of matrices with equal columns but
>unequal
>>> > rows? I think NO and I can't use array function.
>>> >
>>> > I am trying to get à list of matrices having the same number of
>columns but
>>> > different number of rows. as
>>> >
>>> > Cubist <- vector("list", 1)
>>> > Cubist$Expression <- vector("list", 1)
>>> >
>>> >
>>> > for (i in 1:80){
>>> >
>>> > matrix <- function(getGeneExpression[i])
>>> > Cubist$Expression[[Disease[i]]] <- matrix
>>> >
>>> > }
>>> >
>>> > At this step I have:
>>> > length(Cubist$Expression)
>>> > #80
>>> > dim(Cubist$Expression$Disease1)
>>> > #526 1000
>>> > dim(Cubist$Expression$Disease2)
>>> > #106  1000
>>> >
>>> > names(Cubist$Expression$Disease1[4])
>>> > #ABD
>>> >
>>> > names(Cubist$Expression$Disease2[4])
>>> > #ABD
>>> >
>>> > Now I need to built the final matrices for every genes (1000) that
>I will
>>> > use for CORREP function.
>>> >
>>> > Is there a way to extract directly the first column (first gene)
>for all
>>> > Diseases (80)  from Cubist$Expression? or
>>> >
>>> 
>>> I don't understand most your question, but the above seems to be
>straight forward.  Here's a toy example:
>>> 
>>> # make a list of matrices, each with nCol columns and differing
>>> # number of rows, nRow
>>> nCol <- 3
>>> nRow <- sample(3:10, 5)
>>> x <- lapply(nRow, function(x, nc) {matrix(x:(x + nc*x - 1), ncol =
>nc, nrow = x)}, nCol)
>>> x
>>> 
>>> # make a simple function to get a single column from a matrix
>>> getColumn <- function(x, colNum) {
>>>    return(x[,colNum])
>>> }
>>> 
>>> # use the function to get the column from each matrix
>>> col1 <- lapply(x, getColumn, 1)
>>> col1
>>> 
>>> Does that help answer this part of your question?  If not, you may
>need to create a very small example of your data and post it here using
>the head() and dput() functions.
>>> 
>>> Ben
>>> 
>>> 
>>> 
>>> > I need to built 1000 matrices with 80 columns and unequal rows?
>>> >
>>> > Cublist$Diseases <- vector("list", 1)
>>> >
>>> > for (k in 1:1000){
>>> > for (i in 1:80){
>>> >
>>> > Cublist$Diseases[[gene[k] ]] <- Cubist$Expression[[Diseases[i]
>]][k]
>>> > }
>>> >
>>> > }
>>> >
>>> > This double loops is time consuming...Is there a way to do this
>faster?
>>> >
>>> > Thanks,
>>> > karim
>>> >  Ô__
>>> > c/ /'_;~~~~kmezhoud
>>> > (*) \(*)   ⴽⴰⵔⵉⵎ  ⵎⴻⵣⵀⵓⴷ
>>> > http://bioinformatics.tn/
>>> >
>>> >       [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>> 
>>> 
>> 
>> Ben Tupper
>> Bigelow Laboratory for Ocean Sciences
>> 60 Bigelow Drive, P.O. Box 380
>> East Boothbay, Maine 04544
>> http://www.bigelow.org
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>
>Ben Tupper
>Bigelow Laboratory for Ocean Sciences
>60 Bigelow Drive, P.O. Box 380
>East Boothbay, Maine 04544
>http://www.bigelow.org
>
>
>
>
>
>
>
>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list