[BioC] RE : designing an eSet derived object

Wolfgang RAFFELSBERGER wraff at igbmc.fr
Mon Nov 22 12:44:37 CET 2010


Dear Martin,

thank you very much for your helpful input. I'm sorry I have to bug you again.
I was about there, but at the recent Bioconductor Developer Meeting I got another intersting suggestion, which I haven't succeded implementing.
Briefly, (if I understood right) the idea was rather to make a modified SimpleList class where I could check that each elment is an expression set  (instead of using the SimpleList class as is). From there one might even go one step further and check if all dimensions are identical, too ...

For the making the modified SimpleList I returned to the help provided in the Bioconductor pdf "Biobase development and the new eSet". But it seems I'm not getting the inizialization right.  
My 'problem' is, that I don't want to fix in advance how many ExperssionSets will be put in the list (SimpleList), neither what their names will be.  This way I hope the object will be sufficienltly general to hold results from normalization-methods that might become available in the future. Now, this is now quite different to the example provided in  "Biobase development and the new eSet".

To link to my previous post: This (modified) SimpleList will then be used as a slot (allowing to store data normalized by multiple methods) of another new class (the "GxSet"), plus in other slots for data-derived values (averages, etc) and more documentation/notes)...

Thank's in advance fro any hints,
Wolfgang


> 
>  require(Biobase); require(IRanges); require(affy)
> # the toy data
>  eset1 <- new("ExpressionSet", exprs=matrix(1,10,4))
>  pData(eset1) <- data.frame("class"=c(1,2,2,2))
> 
>  eset2 <- new("ExpressionSet", exprs=matrix(3,10,4))
>  pData(eset2) <- data.frame("class"=c(1,2,2,2))
> 
> # making the modified class
>  setClass("GxSimpleList",contains="SimpleList")
[1] "GxSimpleList"
>  getClass("GxSimpleList")
Class "GxSimpleList" [in ".GlobalEnv"]

Slots:
                                                                      
Name:         listData elementMetadata     elementType        metadata
Class:            list             ANY       character            list

Extends: 
Class "SimpleList", directly
Class "Sequence", by class "SimpleList", distance 2
Class "Annotated", by class "SimpleList", distance 3
> 
> # for the "initialize" I didn't understand how to formulate it in my case (as I don't know how many elements, neither their names)
>  setMethod("initialize","GxSimpleList", function(.object,...) listData = listDataNew(lapply(list(.object,...) == "ExpressionSet") ))
Error in conformMethod(signature, mnames, fnames, f, fdef, definition) : 
  in method for ‘initialize’ with signature ‘.Object="GxSimpleList"’: formal arguments (.Object = "GxSimpleList", ... = "GxSimpleList") omitted in the method definition cannot be in the signature
> 
>  setMethod("initialize","GxSimpleList", function(.object,...) {.object <- callNextMethod(.object,...)})
Error in conformMethod(signature, mnames, fnames, f, fdef, definition) : 
  in method for ‘initialize’ with signature ‘.Object="GxSimpleList"’: formal arguments (.Object = "GxSimpleList", ... = "GxSimpleList") omitted in the method definition cannot be in the signature
> 
> # I guess the check for experssionSets should go into validity
>  setValidity("GxSimpleList", function(object) {   # experimetal
+    if(sum(!(unlist(lapply(object,function(x) class(x))) %in% "ExpressionSet")) >0) "A 'GxSimpleList' object should contain elements of class 'ExpressionSet' only !"
+    #same as ?#  assayDataValidMembers(class(object), rep("ExpressionSet",length(object)))
+    })
Class "GxSimpleList" [in ".GlobalEnv"]

Slots:
                                                                      
Name:         listData elementMetadata     elementType        metadata
Class:            list             ANY       character            list

Extends: 
Class "SimpleList", directly
Class "Sequence", by class "SimpleList", distance 2
Class "Annotated", by class "SimpleList", distance 3
> 
> # what happens ..
>  lst1 = SimpleList(a=eset1, b=eset2)   # OK
> 
>  lst2 = new("GxSimpleList",a=eset1, b=eset2)  # error (due to missing "initialize" ?)
Error in initialize(value, ...) : 
  invalid names for slots of class "GxSimpleList": a, b
>  lst3 = GxSimpleList(a=eset1, b=eset2)        # error (due to missing "initialize" ?)
Error: could not find function "GxSimpleList"
> 
> # for completeness ...
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
[5] LC_TIME=French_France.1252    

attached base packages:
[1] grDevices datasets  splines   graphics  stats     tcltk     utils     methods   base     

other attached packages:
[1] affy_1.28.0     IRanges_1.8.0   Biobase_2.10.0  svSocket_0.9-50 TinnR_1.0.3     R2HTML_2.2      Hmisc_3.8-3     survival_2.35-8

loaded via a namespace (and not attached):
[1] affyio_1.18.0         cluster_1.13.1        grid_2.12.0           lattice_0.19-13       preprocessCore_1.12.0 svMisc_0.9-60        
[7] tools_2.12.0         
> 



. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wolfgang Raffelsberger, PhD
Laboratoire de BioInformatique et Génomique Intégratives
IGBMC,
1 rue Laurent Fries,  67404 Illkirch  Strasbourg,  France
Tel (+33) 388 65 3300         Fax (+33) 388 65 3276
wolfgang.raffelsberger (at ) igbmc.fr

________________________________________
De : Martin Morgan [mtmorgan at fhcrc.org]
Date d'envoi : vendredi 5 novembre 2010 18:33
À : Wolfgang RAFFELSBERGER
Cc : bioconductor at stat.math.ethz.ch
Objet : Re: [BioC] designing an eSet derived object

On 11/05/2010 05:02 AM, Wolfgang RAFFELSBERGER wrote:
> Dear list,
>

> basically I'm trying to design an object to contain the following
> microarray-data
> 1) "gxIndData": microarray-data normalized in parallel by (an
> array-dependent) number of n methods plus the corresponding
> expression-calls (again, <= n methods),
> 2) "gxAvData": derived values (replicate-averages, SEMs, etc),
> 3) gene/spot annotation,
> 4) sample-description,
> 5) various supl informations (parameters, notes, versions, etc)
>
> In overall, this is a somehow modified/extended concept to the
> Biobase eSet and I'm trying to figure out if there is a way to use
> the Biobase eSet. This way I hope to maintain a decent level of
> compatibility with other Bioconductor methods and allow code-reuse.
>
> Now I'd like to store  the various sections of 1) and 2) as separate
> lists with n matrixes of values to keep things organized.
>
> According to the Vignette "Biobase development and the new eSet"
> section 5 ("Extending eSet"), I defined new a new class 'eSet'. But
> as soon as I integrate something different than matrixes at the level
> of 'AssayData', I get an error-message (see code below) - no matter
> if these are simply lists or custom-objects. I suppose this means
> that I would have to store all matrixes (up to 10*6methods =60
> matrixes) without further organization at the level of 'AssayData'.

eSet requires that all AssayData elements are two-dimensional with
identical dimensions, so a list-of-matrices would not work.

> However, I'd like to keep at least one (in my case better 2) levels
> of additional arborescence to keep the data organized.
>
> So, finally I would like to integrate two new classes for 1) and 2)
> at the level of the assayData slot of my modified/new eSet.
>
> Does this mean this is not possible and that I cannot use the 'eSet'
> for my purposes ? Do I have to create a novel class somehow
> equivalent but finally incompatible to the 'eSet' ?
>
> Any suggestions/hints ?

One possiblity, if this is for your own use and not as the foundation
for a package, is to use NChannelSet, where each method is a 'channel'.

Another possibility is to create a class that extends eSet with a slot
containing, e.g., an AnnotatedDataFrame with columns describing the
AssayData, and a method to query the slot / select the appropriate
assayData elements

And perhaps what you really have is more a list of (of lists of)
ExpressionSets, each element of the list with additional information. An
approach here would use the IRanges 'SimpleList' infrastructure, e.g.,

> lst = SimpleList(a=new("ExpressionSet"), b=new("ExpressionSet"))
> elementMetadata(lst) = DataFrame(method=c("A", "B"))
> lst[elementMetadata(lst)$method == "A"]
SimpleList of length 1
names(1): a
> lst[elementMetadata(lst)$method == "A"][[1]]
ExpressionSet (storageMode: lockedEnvironment)
assayData: 0 features, 0 samples
  element names: exprs
protocolData: none
phenoData: none
featureData: none
experimentData: use 'experimentData(object)'
Annotation:

Martin

>
> Thank’s in advance,
> wolfgang
>
> ##
>
>  require(Biobase)
>  setClass("gxSet", contains = "eSet")
>  setMethod("initialize", "gxSet", function(.Object, A=new("list"),B=new("list"),...) {
>    callNextMethod(.Object, A=A,B=B,  ...) })
>  new("gxSet")
>  ## produces :
>  Error in function (storage.mode = c("lockedEnvironment", "environment",  :
>    'AssayData' elements with invalid dimensions: 'A' 'B'
>
>
>  ## ideally I'd like to use
>  setClass("gxIndData",representation(SIdata="list",SIcall="list"))
>  setClass("gxAvData",representation(avSI="list",expressed="list",SEM="list", conCall="list",
>    FC="list",FiltFin="list",FiltSI="list",FiltOther="list"))
>  setClass("gxSet", contains = "eSet")
>
>  setMethod("initialize","gxSet", function(.Object,
>    assayData=assayDataNew(IndData=IndData,AvData=AvData),
>    IndData=new("gxIndData"), AvData=new("gxAvData"),...) {
>    if(!missing(assayData) && any(!missing(IndData), !missing(AvData))) {
>      warning("using 'assayData'; ignoring 'IndData', 'AvData'") }
>    callNextMethod(.Object, assayData = assayData, ...)
>  })
>
>  new("gxSet")
>  ## produces :
>  Error in assayDataNew(IndData = IndData, AvData = AvData) :
>    'AssayData' elements with invalid dimensions: 'AvData' 'IndData'
>
>
>  ## the alternative : an eSet 'like' but independent and incompatible object ..
>  setClass("gxSet",representation(IndData="gxIndData",AvData="gxAvData",phenoData="AnnotatedDataFrame",featureData="AnnotatedDataFrame",
>   experimentData="MIAME",annotation="character",protocolData="AnnotatedDataFrame",notes="list"))
>
>
>
> ## for completeness:
> sessionInfo()
> R version 2.12.0 (2010-10-15)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252
> [4] LC_NUMERIC=C                   LC_TIME=French_France.1252
>
> attached base packages:
> [1] grDevices datasets  splines   graphics  stats     tcltk     utils     methods   base
>
> other attached packages:
> [1] affy_1.28.0     Biobase_2.10.0  svSocket_0.9-50 TinnR_1.0.3     R2HTML_2.2      Hmisc_3.8-3     survival_2.35-8
>
> loaded via a namespace (and not attached):
> [1] affyio_1.18.0         cluster_1.13.1        grid_2.12.0           lattice_0.19-13       preprocessCore_1.12.0
> [6] svMisc_0.9-60         tools_2.12.0
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Wolfgang Raffelsberger, PhD
> Laboratoire de BioInformatique et Génomique Intégratives
> IGBMC,
> 1 rue Laurent Fries,  67404 Illkirch  Strasbourg,  France
> Tel (+33) 388 65 3300         Fax (+33) 388 65 3276
> wolfgang.raffelsberger @ igbmc.fr
>
>
>       [[alternative HTML version deleted]]
>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list