[BioC] Fwd: GO terms: Annotation for HumanMethylation450

Marc Carlson mcarlson at fhcrc.org
Wed Apr 3 19:40:58 CEST 2013


Hi Jinyan,

The code I showed you before will get you all the GO TERMS and their 
DESCRIPTIONS into a single data frame (without using too much RAM):

library(GO.db)
k = keys(GOTERM)  ## k is now all the GOIDs that we actually have Terms 
for.
## If you use another source of GOIDs, you might want to call unique() 
on that 1st.
## In order to save time.
## Then just call select like I showed you before
result = select(GO.db, keys =k, cols=c("DEFINITION","TERM"))

## Then you can use merge() to attach that onto your gene IDs later on.

I hope this helps,


   Marc



On 04/03/2013 08:28 AM, Tim Triche, Jr. wrote:
> Probably so. I will look into it. Thanks for the report
>
> --t
>
> On Apr 3, 2013, at 8:21 AM, Jinyan Huang <jhuang at hsph.harvard.edu> wrote:
>
>> Are there any others efficient way to do this? I just thought there
>> are some problem in my code.
>>
>> On Wed, Apr 3, 2013 at 11:14 AM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
>>> Buy more RAM :-)
>>>
>>> --t
>>>
>>> On Apr 3, 2013, at 6:59 AM, Jinyan Huang <jhuang at hsph.harvard.edu> wrote:
>>>
>>>> When I want to get all GO terms on IlluminaHumanMethylation450k. There
>>>> is a memory problem. It uses more than 10G memory.
>>>>
>>>> GOids <- lapply(res2, function(x) unlist(lapply(x, function(y) y$GOID)))
>>>> GOterms <- lapply(GOids, function(x) mget(x, GOTERM, ifnotfound=NA))
>>>> Error: memory exhausted (limit reached?)
>>>> Execution halted
>>>>
>>>>
>>>> --------------------------------------Get_all_GO.R----------------------------------------------
>>>>
>>>> library(IlluminaHumanMethylation450k.db)
>>>> ## allow both singly- and multiply-mapped probes (e.g. for SYMBOL)
>>>> IlluminaHumanMethylation450kGOall
>>>> <-toggleProbes(IlluminaHumanMethylation450kGO,'all')
>>>> ## now let's look at the differences that result from toggleProbes()
>>>> mapped_probes_toggled <- mappedkeys(IlluminaHumanMethylation450kGOall)
>>>> res <- mget(mapped_probes_toggled, IlluminaHumanMethylation450kGOall,
>>>> ifnotfound=NA)
>>>> res2 <- lapply(res, function(x) x[sapply(x, function(y) y['Evidence']!='IEA')])
>>>> ## fetch the GOIDs from the unencumbered toggled map, to get terms for them
>>>> library(GO.db)
>>>> GOids <- lapply(res2, function(x) unlist(lapply(x, function(y) y$GOID)))
>>>> GOterms <- lapply(GOids, function(x) mget(x, GOTERM, ifnotfound=NA))
>>>> d<-lapply(GOterms,function(x)do.call(rbind,lapply(x,function(y)data.frame(y at Term,y at GOID,y at Ontology))))
>>>> df<-do.call(rbind,d)
>>>> len <- sapply(d,function(x)length(x[,1]))
>>>> probes <- rep(names(d),len)
>>>> df.out<-data.frame(probes=probes,df)
>>>> names(df.out)<-c("probe","GoTerm","GOID","GOCategory")
>>>> write.table(df.out,"GO_all.txt",quote=F,row.names=F,col.names=T,sep="\t")
>>>>
>>>> ----------------------------------------------------------------------------------------------------------------
>>>>
>>>> On Tue, Apr 2, 2013 at 7:29 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> Not sure how I managed not to cc: the list on this initially. Here's some GO.db code with a sort of "moral" to it ;-)
>>>>>
>>>>> --t
>>>>>
>>>>> Begin forwarded message:
>>>>>
>>>>> library(IlluminaHumanMethylation450k.db)
>>>>>
>>>>> ## allow both singly- and multiply-mapped probes (e.g. for SYMBOL) IlluminaHumanMethylation450kGOall <-toggleProbes(IlluminaHumanMethylation450kGO, 'all')
>>>>>
>>>>> ## now let's look at the differences that result from toggleProbes()
>>>>> mapped_probes_default <- mappedkeys(IlluminaHumanMethylation450kGO)
>>>>> mapped_probes_toggled <- mappedkeys(IlluminaHumanMethylation450kGOall)
>>>>> multimapped <- setdiff( mapped_probes_toggled, mapped_probes_default )
>>>>>
>>>>> res0 <- mget(head(multimapped), IlluminaHumanMethylation450kGO, ifnotfound=NA)
>>>>> res <- mget(head(multimapped), IlluminaHumanMethylation450kGOall, ifnotfound=NA)
>>>>>
>>>>> ## fetch the GOIDs from the unencumbered toggled map, to get terms for them
>>>>>
>>>>> library(GO.db)
>>>>> GOids <- lapply(res, function(x) unlist(lapply(x, function(y) y$GOID)))
>>>>> GOterms <- lapply(GOids, function(x) mget(x, GOTERM, ifnotfound=NA))
>>>>> head(GOterms)
>>>>>
>>>>>
>>>>>> I'll add this to the docs (next release)
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> --t
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Mar 29, 2013 at 11:24 AM, Fabrice Tourre <fabrice.ciup at gmail.com> wrote:
>>>>>>> Tim,
>>>>>>>
>>>>>>> Thank you very much for your reply.
>>>>>>> I have a list of probe list.
>>>>>>> Do you a example script for me to get the GO terms, instead of GO ID?
>>>>>>>
>>>>>>> The Documentation is not very clear for this.
>>>>>>> http://www.bioconductor.org/packages/2.11/data/annotation/html/IlluminaHumanMethylation450k.db.html
>>>>>>>
>>>>>>> On Fri, Mar 29, 2013 at 12:29 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
>>>>>>>> Oddly enough, the paper from UCSD with Illumina's folks on it (*) used the
>>>>>>>> IlluminaHumanMethylation450k.db package (which I am currently rebuilding to
>>>>>>>> have a startup message about toggleProbes()) to annotate both CpG islands
>>>>>>>> and GO terms.
>>>>>>>>
>>>>>>>> (*)
>>>>>>>> http://idekerlab.ucsd.edu/publications/Documents/Hannum_MolCell_2012.pdf
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 29, 2013 at 8:49 AM, Fabrice Tourre <fabrice.ciup at gmail.com>
>>>>>>>> wrote:
>>>>>>>>> Dear list,
>>>>>>>>>
>>>>>>>>> In the annotation file of Infinium HumanMethylation450 BeadChip,
>>>>>>>>>
>>>>>>>>> http://support.illumina.com/documents/MyIllumina/b78d361a-def5-4adb-ab38-e8990625f053/HumanMethylation450_15017482_v.1.2.csv
>>>>>>>>>
>>>>>>>>> for each probe set, they do not have annotation for GO terms, pathways.
>>>>>>>>>
>>>>>>>>> As they have done in the annotation file: HG-U133_Plus_2.na32.annot.csv.
>>>>>>>>>
>>>>>>>>> Is there some bioconductor package to annotated the Infinium
>>>>>>>>> HumanMethylation450 probes? Given a probe, feed back the GO terms and
>>>>>>>>> pathways.
>>>>>>>>>
>>>>>>>>> Thank you very much in advance.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioconductor mailing list
>>>>>>>>> Bioconductor at r-project.org
>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>> Search the archives:
>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> A model is a lie that helps you see the truth.
>>>>>>>>
>>>>>>>> Howard Skipper
>>>>>>
>>>>>>
>>>>>> --
>>>>>> A model is a lie that helps you see the truth.
>>>>>>
>>>>>> Howard Skipper
>>>>>        [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>
>>>>
>>>> --
>>>> Best wishes,
>>>>
>>>> Jinyan HUANG
>>
>>
>> -- 
>> Best wishes,
>>
>> Jinyan HUANG



More information about the Bioconductor mailing list