[BioC] Annotation.db: how automatically call a mapping?

Tue Jun 30 19:44:03 CEST 2009

Hi Guido,

Hooiveld, Guido wrote:
> Hi Martin,
> 
> Indeed, another useful, straigh-forward possibility for mapping. 
> However, I am now facing the problem of properly combining the
> annotation info with the expression data. This is what I would like to
> do:
> 
>> Tab_data <- exprs(eset[probeids])
>> Tab_data <- cbind(Tab_data, fit2$Amean) # to add average expression of
> LIMMA output
>> Tab_data <- cbind(Tab_data, fit2$p.value) # to add p-value of LIMMA
> output
> etc.
> 
> This al goes fine, however adding the annotation info 'mixes-up' the
> content of Tab_data; the annotation data replaces the first column of
> Tab_data, and the content of all cells is replaced by 'null'. I suspect
> it has something to do with the type of object I would like to merge,
> but I am not sure.
> 
>> map.entrez <- getAnnMap("ENTREZID", annotation(eset))
>> map.entrez <- as.list(map.entrez[probeids])

This sort of thing is going to get really difficult to do by hand when 
you get to things that have a one-to-many relationship. And you are 
already duplicating existing efforts with what you have done so far.

If you want to combine annotation data with results data, you really 
want to be using the annaffy package which does lots of these things 
seamlessly. And if you want things to be a bit easier, you could 
consider using the affycoretools package as well, which for the most 
part uses annaffy to create output.

You can do what you appear to want in one line:

probes2table(eset, featureNames(eset), annotation(eset), list("p-value"= 
fit2$p.value, "mean" = fit2$Amean), html = FALSE, filename = "output")

You might need to play around with the 'anncols' argument to get what 
annotation data you might want.

If you want output specific to the contrasts you have fit, see 
?limma2annaffy.

Best,

Jim

> 
> 
>> Tab_data <- cbind(Tab_data, map.entrez)
>   ^ in R this seems to work, but when saved as .txt the content of
> Tab_data is completely mixed up. Before 'adding' map.entrez Tab_dat is
> OK.
> 
> 
>> write.table(cbind(rownames(Tab_data2), Tab_data2),
> file="test_1234.txt", sep="\t", col.names=TRUE, row.names=FALSE)
> 
>> class(Tab_data)
> [1] "matrix"
>> class(map.entrez)
> [1] "list"
> 
> 
> Do you, or someone elsr, have a suggestion how to properly link these
> two types of data?
> Thanks again,
> Guido
> 
> 
> 
>  
> 
>> -----Original Message-----
>> From: bioconductor-bounces at stat.math.ethz.ch 
>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of 
>> Martin Morgan
>> Sent: 30 June 2009 00:00
>> To: Hooiveld, Guido
>> Cc: bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] Annotation.db: how automatically call a mapping?
>>
>> Hooiveld, Guido wrote:
>>> Hi,
>>>  
>>> I am facing a problem i cannot solve myselves, despite everything i 
>>> read/know. But i assume the solution is easy for the more 
>> knowledgable 
>>> folks in BioC/R...
>>>  
>>> This does work:
>>>> library(moe430a.db)
>>>> xxyy <- moe430aSYMBOL
>>>> xxyy
>>> SYMBOL map for chip moe430a (object of class "AnnDbBimap")
>>>  
>>> However, for this to work you need to know the array type 
>> of the data 
>>> that is analyzed.
>>>  
>>>  
>>> Now i would like to automatically extract the (e.g.) SYMBOL mapping 
>>> from an annotation.db, thus by retrieving the array type 
>> from the eset.
>>>  
>>>> library(affy)
>>>> eset <- rma(data)
>>>> probeids <- featureNames(eset)
>>>> annotation(eset)
>>> [1] "moe430a"
>>>  
>>> But how can i use this info to properly call the SYMBOL mapping?
>> Hi Guido --
>>
>> to get the appropriate map
>>
>>   library(annotate)
>>   map = getAnnMap("SYMBOL", annotation(eset))
>>
>> to select just the relevant probes
>>
>>   map[probeids]
>>
>> toTable(map[probeids]) or as.list(map[probeids]) might be the 
>> next step in the work flow.
>>
>> Martin
>>
>>>  
>>> I tried this:
>>>> arraytype <- annotation(eset)
>>>> arraytype <- paste(arraytype, "db", sep = ".") arraytype
>>> [1] "moe430a.db"
>>>> arraytype <- paste("package", arraytype, sep = ":") gh <- 
>>>> ls(arraytype) gh
>>>  [1] "moe430a"              "moe430a_dbconn"       "moe430a_dbfile"
>>> "moe430a_dbInfo"       "moe430a_dbschema"     "moe430aACCNUM"
>>> "moe430aALIAS2PROBE"   "moe430aCHR"           "moe430aCHRLENGTHS"
>>> "moe430aCHRLOC"       
>>> [11] "moe430aCHRLOCEND"     "moe430aENSEMBL"
>>> "moe430aENSEMBL2PROBE" "moe430aENTREZID"      "moe430aENZYME"
>>> "moe430aENZYME2PROBE"  "moe430aGENENAME"      "moe430aGO"
>>> "moe430aGO2ALLPROBES"  "moe430aGO2PROBE"     
>>> [21] "moe430aMAP"           "moe430aMAPCOUNTS"     "moe430aMGI"
>>> "moe430aMGI2PROBE"     "moe430aORGANISM"      "moe430aPATH"
>>> "moe430aPATH2PROBE"    "moe430aPFAM"          "moe430aPMID"
>>> "moe430aPMID2PROBE"   
>>> [31] "moe430aPROSITE"       "moe430aREFSEQ"        "moe430aSYMBOL"
>>> "moe430aUNIGENE"       "moe430aUNIPROT"
>>>  
>>>> gh[33]
>>> [1] "moe430aSYMBOL"
>>>> symbols <- mget(probeids, gh[33])
>>> Error in mget(probeids, gh[33]) : second argument must be an 
>>> environment
>>>  
>>> This also doesn't work:
>>>> symbols <- mget(probeids, envir=gh[33])
>>> Error in mget(probeids, envir = gh[33]) : 
>>>   second argument must be an environment
>>>  
>>> My approach thus is the wrong approach to automatically extract 
>>> mappings from a annotation.db.
>>> Since i don't know about any other possibility, i would 
>> appreciate if 
>>> someone could point me to a working solution.
>>>  
>>> Thanks,
>>> Guido
>>>  
>>>
>>> ------------------------------------------------
>>> Guido Hooiveld, PhD
>>> Nutrition, Metabolism & Genomics Group Division of Human Nutrition 
>>> Wageningen University Biotechnion, Bomenweg 2
>>> NL-6703 HD Wageningen
>>> the Netherlands
>>> tel: (+)31 317 485788
>>> fax: (+)31 317 483342 
>>> internet:   http://nutrigene.4t.com <http://nutrigene.4t.com/>  
>>> email:      guido.hooiveld at wur.nl 
>>>
>>>
>>>
>>> 	[[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: 
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: 
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826