[BioC] combining/merging several ExpressionSet objects from SCAN

Martin Morgan mtmorgan at fhcrc.org
Wed Aug 14 07:30:24 CEST 2013


On 08/13/2013 09:47 PM, Kasper Daniel Hansen wrote:
> Btw, I have started to use combineList as an alias for a method that works on a
> list in a faster way.  Depending on context, this may be much faster than
> Reduce().  For example with SummarizedExperiments where the rowData are all equal.
>
> Martin: it seems like we have started to go away from combine() as the general
> combine two datasets and instead are using cbind/rbind for SummarizedExperiment,
> or am I missing something?  (Probably am).

combine's semantics are more complicated than cbind or rbind

 >      m <- matrix(1:20, nrow=5, dimnames=list(LETTERS[1:5], letters[1:4]))
 >      combine(m[1:3, 1:3], m[3:5, 3:4]) # overlap
    a  b  c  d
A  1  6 11 NA
B  2  7 12 NA
C  3  8 13 18
D NA NA 14 19
E NA NA 15 20

In the end I'm always amazed that the implementations do close enough to the 
right thing that people either don't complain or don't notice that it's not 
exactly what they want. Personally I think it's better to leave all this 
cleverness in the hands of the user.

Martin

>
> Kasper
>
>
> On Tue, Aug 13, 2013 at 7:33 PM, Tim Triche, Jr. <tim.triche at gmail.com
> <mailto:tim.triche at gmail.com>> wrote:
>
>     Now I remember where I'd seen it before.  Thanks!
>
>
>
>     On Tue, Aug 13, 2013 at 4:26 PM, Martin Morgan <mtmorgan at fhcrc.org
>     <mailto:mtmorgan at fhcrc.org>> wrote:
>
>         On 08/13/2013 04:24 PM, Tim Triche, Jr. wrote:
>
>                 Tim: I thought combine() has binary.
>
>
>             I was curious about this, so I went and looked.  It's binary in
>             methods-eSet.R, or so it appears, but:
>
>             library(GEOquery)
>             gset <- getGEO('GSE40279')
>             length(gset)
>             ## [1] 3
>             sapply(gset, ncol)
>             ## GSE40279_series_matrix-1.txt.__gz.Samples
>             ##                                     255
>             ## GSE40279_series_matrix-2.txt.__gz.Samples
>             ##                                     255
>             ## GSE40279_series_matrix-3.txt.__gz.Samples
>             ##                                     146
>             gset <- do.call(combine, gset)
>             ## Error in as.vector(x, "character") :
>             ##   cannot coerce type 'closure' to vector of type 'character'
>             gset <- combine(gset[[1]], gset[[2]], gset[[3]])
>             ## no problem
>             ncol(gset)
>             ## Samples
>             ##    656
>
>             The part that baffles me is that I don't see anywhere that combine()
>             would
>             recurse in the source!
>
>
>         It's the generic that does the magic
>
>          > getGeneric("combine")
>         nonstandardGenericFunction for "combine" defined from package "BiocGenerics"
>
>         function (x, y, ...)
>         {
>              if (length(list(...)) > 0L) {
>                  callGeneric(x, do.call(callGeneric, list(y, ...)))
>              }
>              else {
>                  standardGeneric("combine")
>              }
>         }
>         <environment: 0x646ceb0>
>         Methods may be defined for arguments: x, y
>         Use  showMethods("combine")  for currently available ones.
>
>
>
>
>
>             On Sat, Aug 10, 2013 at 5:43 PM, Kasper Daniel Hansen <
>             kasperdanielhansen at gmail.com <mailto:kasperdanielhansen at gmail.com>>
>             wrote:
>
>                 Tim: I thought combine() has binary.
>
>                 Juliet: this often comes up.  You can do
>                     assign("NAME", object)
>                 which is essentially equivalent to
>                     NAME <- object
>                 Then you can also get the object by
>                     get("NAME")
>
>                 Kasper
>
>
>
>                 On Fri, Aug 9, 2013 at 3:47 PM, Juliet Hannah
>                 <juliet.hannah at gmail.com <mailto:juliet.hannah at gmail.com>>__wrote:
>
>                     Thanks Tim.
>
>                     I'm trying this out and looks like I did not think this
>                     through. So let's
>                     say I have several CEL files.
>
>                     I am spreading this out over a cluster. So I have
>
>                     myFiles = ("cel1.CEL","cel2,CEL")
>
>                     # pick one file
>
>                     fileToNormalize = myFiles[fileIndex]
>
>                     # normalize this to create eset
>
>                     normalized = SCAN(fileToNormalize)
>
>                     # create unique output name to save
>
>                     outFileName <- paste(fileToNormalize,".Rdata"__,sep="")
>
>                     save(normalized,file=__outFileName)
>
>                     My problem is now an R one, but related to this problem.
>                     "normalized" must
>                     be given a unique name so that when I load
>                     all the esets back in they have different names.
>
>                     Any suggestions?
>
>
>
>
>                     On Fri, Aug 9, 2013 at 3:27 PM, Tim Triche, Jr.
>                     <tim.triche at gmail.com <mailto:tim.triche at gmail.com>
>
>                         wrote:
>
>
>                         qux <- combine(foo, bar, baz)
>
>                         if foo, bar, and baz are all ExpressionSets (or other
>                         eSets of suitable
>                         mien) with the same fData rows and pData columns.
>
>                         I did this the other day for a huge GEO dataset that
>                         came as 3 separate
>                         Esets off of getGEO.  (Then I turned it into a
>                         SummarizedExperiment, of
>                         course)
>
>
>
>                         On Fri, Aug 9, 2013 at 12:22 PM, Juliet Hannah
>                         <juliet.hannah at gmail.com <mailto:juliet.hannah at gmail.com>
>                         wrote:
>
>                             Thanks Ryan. I'll try this out. What if the esets
>                             are already
>
>                     generated.
>
>                             So
>                             I have a bunch laying
>                             around. I'll look into your answer more and see if I
>                             can apply it to
>
>                     this
>
>                             situation.
>
>
>                             On Fri, Aug 9, 2013 at 3:14 PM, Ryan C. Thompson
>                             <rct at thompsonclan.org <mailto:rct at thompsonclan.org>
>
>                                 wrote:
>
>
>                                      celfiles <- list.files(pattern=".*\\.CEL$"__**)
>
>
>                                      scan.esets <- lapply(celfiles, SCAN)
>                                      scan.eset <- Reduce(combine, scan.esets)
>
>                                 I forget what package the "combine" function is
>                                 from, but I assume
>
>                     it's
>
>                                 the same package that provides the ExpressionSet
>                                 class.
>
>
>
>                                 On Fri 09 Aug 2013 12:11:05 PM PDT, Juliet
>                                 Hannah wrote:
>
>                                     All,
>
>                                     SCAN outputs an ExpressionSet object for
>                                     each CEL file. What is a
>
>                     nicer
>
>                                     way
>                                     to put them all into a matrix other than
>                                     converting each one to a
>
>                             vector
>
>                                     and then cbinding after each conversion?
>
>                                     Thanks,
>
>                                     Juliet
>
>                                               [[alternative HTML version deleted]]
>
>                                     ________________________________**_________________
>                                     Bioconductor mailing list
>                                     Bioconductor at r-project.org
>                                     <mailto:Bioconductor at r-project.org>
>                                     https://stat.ethz.ch/mailman/*__*listinfo/bioconductor
>                                     <https://stat.ethz.ch/mailman/**listinfo/bioconductor><
>
>                             https://stat.ethz.ch/mailman/__listinfo/bioconductor
>                             <https://stat.ethz.ch/mailman/listinfo/bioconductor>>
>
>                                     Search the archives:
>                                     http://news.gmane.org/gmane.**
>                                     science.biology.informatics.**__conductor<
>
>                             http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>                             <http://news.gmane.org/gmane.science.biology.informatics.conductor>>
>
>
>
>
>                                       [[alternative HTML version deleted]]
>
>                             _________________________________________________
>                             Bioconductor mailing list
>                             Bioconductor at r-project.org
>                             <mailto:Bioconductor at r-project.org>
>                             https://stat.ethz.ch/mailman/__listinfo/bioconductor
>                             <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>                             Search the archives:
>                             http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>                             <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>
>                         --
>                         *He that would live in peace and at ease, *
>                         *Must not speak all he knows, nor judge all he sees.*
>                         *
>                         *
>                         Benjamin Franklin, Poor Richard's Almanack<
>
>                     http://archive.org/details/__poorrichardsalma00franrich
>                     <http://archive.org/details/poorrichardsalma00franrich>>
>
>
>
>                               [[alternative HTML version deleted]]
>
>                     _________________________________________________
>                     Bioconductor mailing list
>                     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>                     https://stat.ethz.ch/mailman/__listinfo/bioconductor
>                     <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>                     Search the archives:
>                     http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>                     <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>
>
>
>
>
>         --
>         Computational Biology / Fred Hutchinson Cancer Research Center
>         1100 Fairview Ave. N.
>         PO Box 19024 Seattle, WA 98109
>
>         Location: Arnold Building M1 B861
>         Phone: (206) 667-2793 <tel:%28206%29%20667-2793>
>
>
>
>
>     --
>     /He that would live in peace and at ease, /
>     /Must not speak all he knows, nor judge all he sees./
>     /
>     /
>     Benjamin Franklin, Poor Richard's Almanack
>     <http://archive.org/details/poorrichardsalma00franrich>
>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list