[BioC] Annotation.db: how automatically call a mapping?

Tue Jun 30 21:54:03 CEST 2009

Hi Jim,
Since your suggestion looks indeed easy and seems to provide everything
I would like to have, a gave it a try (I'll have a further look at
Martin's comments/suggestions later).
However, it seems that 'probes2table' cannot properly handle multiple
contrast defined in limma:

> library(affycoretools)
Loading required package: GO.db
Loading required package: DBI
Loading required package: KEGG.db
> probes2table(eset, featureNames(eset), annotation(eset),
list("p-value"= fit2$p.value, "mean" = fit2$Amean), html = FALSE,
filename = "output")
Loading required package: moe430a.db
Error in aafTable(items = otherdata) : 
  All columns must be of equal length
> length(fit2$p.value)
[1] 68070
> 
> length(fit2$Amean)
[1] 22690
> 

I analyzed three contrasts in limma, and 68070/3 is 22690, which exactly
equals the number of probesets on the Affy MOE430A array. This thus
explains the error.

Question: can this easily be solved? Can limma2annaffy handle multiple
contrasts? (at the moment I am not familiar with affycoretools at all).

Thanks,
Guido  

> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch 
> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of 
> James W. MacDonald
> Sent: 30 June 2009 19:44
> To: Hooiveld, Guido
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Annotation.db: how automatically call a mapping?
> 
> Hi Guido,
> 
> Hooiveld, Guido wrote:
> > Hi Martin,
> > 
> > Indeed, another useful, straigh-forward possibility for mapping. 
> > However, I am now facing the problem of properly combining the 
> > annotation info with the expression data. This is what I 
> would like to
> > do:
> > 
> >> Tab_data <- exprs(eset[probeids])
> >> Tab_data <- cbind(Tab_data, fit2$Amean) # to add average 
> expression 
> >> of
> > LIMMA output
> >> Tab_data <- cbind(Tab_data, fit2$p.value) # to add p-value of LIMMA
> > output
> > etc.
> > 
> > This al goes fine, however adding the annotation info 
> 'mixes-up' the 
> > content of Tab_data; the annotation data replaces the first 
> column of 
> > Tab_data, and the content of all cells is replaced by 'null'. I 
> > suspect it has something to do with the type of object I 
> would like to 
> > merge, but I am not sure.
> > 
> >> map.entrez <- getAnnMap("ENTREZID", annotation(eset)) 
> map.entrez <- 
> >> as.list(map.entrez[probeids])
> 
> This sort of thing is going to get really difficult to do by 
> hand when you get to things that have a one-to-many 
> relationship. And you are already duplicating existing 
> efforts with what you have done so far.
> 
> If you want to combine annotation data with results data, you 
> really want to be using the annaffy package which does lots 
> of these things seamlessly. And if you want things to be a 
> bit easier, you could consider using the affycoretools 
> package as well, which for the most part uses annaffy to 
> create output.
> 
> You can do what you appear to want in one line:
> 
> probes2table(eset, featureNames(eset), annotation(eset), 
> list("p-value"= fit2$p.value, "mean" = fit2$Amean), html = 
> FALSE, filename = "output")
> 
> You might need to play around with the 'anncols' argument to 
> get what annotation data you might want.
> 
> If you want output specific to the contrasts you have fit, 
> see ?limma2annaffy.
> 
> Best,
> 
> Jim
> 
> 
> 
> > 
> > 
> >> Tab_data <- cbind(Tab_data, map.entrez)
> >   ^ in R this seems to work, but when saved as .txt the content of 
> > Tab_data is completely mixed up. Before 'adding' map.entrez 
> Tab_dat is 
> > OK.
> > 
> > 
> >> write.table(cbind(rownames(Tab_data2), Tab_data2),
> > file="test_1234.txt", sep="\t", col.names=TRUE, row.names=FALSE)
> > 
> >> class(Tab_data)
> > [1] "matrix"
> >> class(map.entrez)
> > [1] "list"
> > 
> > 
> > Do you, or someone elsr, have a suggestion how to properly 
> link these 
> > two types of data?
> > Thanks again,
> > Guido
> > 
> > 
> > 
> >  
> > 
> >> -----Original Message-----
> >> From: bioconductor-bounces at stat.math.ethz.ch
> >> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf 
> Of Martin 
> >> Morgan
> >> Sent: 30 June 2009 00:00
> >> To: Hooiveld, Guido
> >> Cc: bioconductor at stat.math.ethz.ch
> >> Subject: Re: [BioC] Annotation.db: how automatically call 
> a mapping?
> >>
> >> Hooiveld, Guido wrote:
> >>> Hi,
> >>>  
> >>> I am facing a problem i cannot solve myselves, despite 
> everything i 
> >>> read/know. But i assume the solution is easy for the more
> >> knowledgable
> >>> folks in BioC/R...
> >>>  
> >>> This does work:
> >>>> library(moe430a.db)
> >>>> xxyy <- moe430aSYMBOL
> >>>> xxyy
> >>> SYMBOL map for chip moe430a (object of class "AnnDbBimap")
> >>>  
> >>> However, for this to work you need to know the array type
> >> of the data
> >>> that is analyzed.
> >>>  
> >>>  
> >>> Now i would like to automatically extract the (e.g.) 
> SYMBOL mapping 
> >>> from an annotation.db, thus by retrieving the array type
> >> from the eset.
> >>>  
> >>>> library(affy)
> >>>> eset <- rma(data)
> >>>> probeids <- featureNames(eset)
> >>>> annotation(eset)
> >>> [1] "moe430a"
> >>>  
> >>> But how can i use this info to properly call the SYMBOL mapping?
> >> Hi Guido --
> >>
> >> to get the appropriate map
> >>
> >>   library(annotate)
> >>   map = getAnnMap("SYMBOL", annotation(eset))
> >>
> >> to select just the relevant probes
> >>
> >>   map[probeids]
> >>
> >> toTable(map[probeids]) or as.list(map[probeids]) might be the next 
> >> step in the work flow.
> >>
> >> Martin
> >>
> >>>  
> >>> I tried this:
> >>>> arraytype <- annotation(eset)
> >>>> arraytype <- paste(arraytype, "db", sep = ".") arraytype
> >>> [1] "moe430a.db"
> >>>> arraytype <- paste("package", arraytype, sep = ":") gh <-
> >>>> ls(arraytype) gh
> >>>  [1] "moe430a"              "moe430a_dbconn"       
> "moe430a_dbfile"
> >>> "moe430a_dbInfo"       "moe430a_dbschema"     "moe430aACCNUM"
> >>> "moe430aALIAS2PROBE"   "moe430aCHR"           "moe430aCHRLENGTHS"
> >>> "moe430aCHRLOC"       
> >>> [11] "moe430aCHRLOCEND"     "moe430aENSEMBL"
> >>> "moe430aENSEMBL2PROBE" "moe430aENTREZID"      "moe430aENZYME"
> >>> "moe430aENZYME2PROBE"  "moe430aGENENAME"      "moe430aGO"
> >>> "moe430aGO2ALLPROBES"  "moe430aGO2PROBE"     
> >>> [21] "moe430aMAP"           "moe430aMAPCOUNTS"     "moe430aMGI"
> >>> "moe430aMGI2PROBE"     "moe430aORGANISM"      "moe430aPATH"
> >>> "moe430aPATH2PROBE"    "moe430aPFAM"          "moe430aPMID"
> >>> "moe430aPMID2PROBE"   
> >>> [31] "moe430aPROSITE"       "moe430aREFSEQ"        "moe430aSYMBOL"
> >>> "moe430aUNIGENE"       "moe430aUNIPROT"
> >>>  
> >>>> gh[33]
> >>> [1] "moe430aSYMBOL"
> >>>> symbols <- mget(probeids, gh[33])
> >>> Error in mget(probeids, gh[33]) : second argument must be an 
> >>> environment
> >>>  
> >>> This also doesn't work:
> >>>> symbols <- mget(probeids, envir=gh[33])
> >>> Error in mget(probeids, envir = gh[33]) : 
> >>>   second argument must be an environment
> >>>  
> >>> My approach thus is the wrong approach to automatically extract 
> >>> mappings from a annotation.db.
> >>> Since i don't know about any other possibility, i would
> >> appreciate if
> >>> someone could point me to a working solution.
> >>>  
> >>> Thanks,
> >>> Guido
> >>>  
> >>>
> >>> ------------------------------------------------
> >>> Guido Hooiveld, PhD
> >>> Nutrition, Metabolism & Genomics Group Division of Human 
> Nutrition 
> >>> Wageningen University Biotechnion, Bomenweg 2
> >>> NL-6703 HD Wageningen
> >>> the Netherlands
> >>> tel: (+)31 317 485788
> >>> fax: (+)31 317 483342 
> >>> internet:   http://nutrigene.4t.com <http://nutrigene.4t.com/>  
> >>> email:      guido.hooiveld at wur.nl 
> >>>
> >>>
> >>>
> >>> 	[[alternative HTML version deleted]]
> >>>
> >>> _______________________________________________
> >>> Bioconductor mailing list
> >>> Bioconductor at stat.math.ethz.ch
> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> Search the archives: 
> >>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives: 
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >>
> > 
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: 
> > http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> --
> James W. MacDonald, M.S.
> Biostatistician
> Douglas Lab
> University of Michigan
> Department of Human Genetics
> 5912 Buhl
> 1241 E. Catherine St.
> Ann Arbor MI 48109-5618
> 734-615-7826
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
>