[BioC] Fwd: GO terms: Annotation for HumanMethylation450

Jinyan Huang jhuang at hsph.harvard.edu
Wed Apr 3 20:36:08 CEST 2013


Thank you. It works now.

result = select(GO.db, keys =k, cols=c("DEFINITION","TERM"))

Here what kind of columns can I select? e.g if I do not want GO term's
Evidence is IEA.

In the help page, I cannot find such information.

On Wed, Apr 3, 2013 at 2:11 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
> May need to do
>
> require(AnnotationDbi)
> require(Homo.sapiens) ## or GO.db, or whatever
>
> in order for that to work.
>
>
>
> On Wed, Apr 3, 2013 at 11:07 AM, Jinyan Huang <jhuang at hsph.harvard.edu>
> wrote:
>>
>> Marc,
>>
>> When I update my R to 2.15.2, I still have the error.
>>
>> R
>>
>> R version 2.15.2 (2012-10-26) -- "Trick or Treat"
>> Copyright (C) 2012 The R Foundation for Statistical Computing
>> ISBN 3-900051-07-0
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> R is free software and comes with ABSOLUTELY NO WARRANTY.
>> You are welcome to redistribute it under certain conditions.
>> Type 'license()' or 'licence()' for distribution details.
>>
>>   Natural language support but running in an English locale
>>
>> R is a collaborative project with many contributors.
>> Type 'contributors()' for more information and
>> 'citation()' on how to cite R or R packages in publications.
>>
>> Type 'demo()' for some demos, 'help()' for on-line help, or
>> 'help.start()' for an HTML browser interface to help.
>> Type 'q()' to quit R.
>>
>> > ids = c( "GO:0008150", "GO:0001869")
>> > result = select(GO.db, keys =ids, cols=c("DEFINITION","TERM"))
>> Error: could not find function "select"
>> > sessionInfo()
>> R version 2.15.2 (2012-10-26)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=C                 LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> On Wed, Apr 3, 2013 at 1:40 PM, Marc Carlson <mcarlson at fhcrc.org> wrote:
>> > Hi Jinyan,
>> >
>> > The code I showed you before will get you all the GO TERMS and their
>> > DESCRIPTIONS into a single data frame (without using too much RAM):
>> >
>> > library(GO.db)
>> > k = keys(GOTERM)  ## k is now all the GOIDs that we actually have Terms
>> > for.
>> > ## If you use another source of GOIDs, you might want to call unique()
>> > on that 1st.
>> > ## In order to save time.
>> > ## Then just call select like I showed you before
>> > result = select(GO.db, keys =k, cols=c("DEFINITION","TERM"))
>> >
>> > ## Then you can use merge() to attach that onto your gene IDs later on.
>> >
>> > I hope this helps,
>> >
>> >
>> >    Marc
>> >
>> >
>> >
>> > On 04/03/2013 08:28 AM, Tim Triche, Jr. wrote:
>> >> Probably so. I will look into it. Thanks for the report
>> >>
>> >> --t
>> >>
>> >> On Apr 3, 2013, at 8:21 AM, Jinyan Huang <jhuang at hsph.harvard.edu>
>> >> wrote:
>> >>
>> >>> Are there any others efficient way to do this? I just thought there
>> >>> are some problem in my code.
>> >>>
>> >>> On Wed, Apr 3, 2013 at 11:14 AM, Tim Triche, Jr.
>> >>> <tim.triche at gmail.com> wrote:
>> >>>> Buy more RAM :-)
>> >>>>
>> >>>> --t
>> >>>>
>> >>>> On Apr 3, 2013, at 6:59 AM, Jinyan Huang <jhuang at hsph.harvard.edu>
>> >>>> wrote:
>> >>>>
>> >>>>> When I want to get all GO terms on IlluminaHumanMethylation450k.
>> >>>>> There
>> >>>>> is a memory problem. It uses more than 10G memory.
>> >>>>>
>> >>>>> GOids <- lapply(res2, function(x) unlist(lapply(x, function(y)
>> >>>>> y$GOID)))
>> >>>>> GOterms <- lapply(GOids, function(x) mget(x, GOTERM, ifnotfound=NA))
>> >>>>> Error: memory exhausted (limit reached?)
>> >>>>> Execution halted
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --------------------------------------Get_all_GO.R----------------------------------------------
>> >>>>>
>> >>>>> library(IlluminaHumanMethylation450k.db)
>> >>>>> ## allow both singly- and multiply-mapped probes (e.g. for SYMBOL)
>> >>>>> IlluminaHumanMethylation450kGOall
>> >>>>> <-toggleProbes(IlluminaHumanMethylation450kGO,'all')
>> >>>>> ## now let's look at the differences that result from toggleProbes()
>> >>>>> mapped_probes_toggled <-
>> >>>>> mappedkeys(IlluminaHumanMethylation450kGOall)
>> >>>>> res <- mget(mapped_probes_toggled,
>> >>>>> IlluminaHumanMethylation450kGOall,
>> >>>>> ifnotfound=NA)
>> >>>>> res2 <- lapply(res, function(x) x[sapply(x, function(y)
>> >>>>> y['Evidence']!='IEA')])
>> >>>>> ## fetch the GOIDs from the unencumbered toggled map, to get terms
>> >>>>> for them
>> >>>>> library(GO.db)
>> >>>>> GOids <- lapply(res2, function(x) unlist(lapply(x, function(y)
>> >>>>> y$GOID)))
>> >>>>> GOterms <- lapply(GOids, function(x) mget(x, GOTERM, ifnotfound=NA))
>> >>>>>
>> >>>>> d<-lapply(GOterms,function(x)do.call(rbind,lapply(x,function(y)data.frame(y at Term,y at GOID,y at Ontology))))
>> >>>>> df<-do.call(rbind,d)
>> >>>>> len <- sapply(d,function(x)length(x[,1]))
>> >>>>> probes <- rep(names(d),len)
>> >>>>> df.out<-data.frame(probes=probes,df)
>> >>>>> names(df.out)<-c("probe","GoTerm","GOID","GOCategory")
>> >>>>>
>> >>>>> write.table(df.out,"GO_all.txt",quote=F,row.names=F,col.names=T,sep="\t")
>> >>>>>
>> >>>>>
>> >>>>> ----------------------------------------------------------------------------------------------------------------
>> >>>>>
>> >>>>> On Tue, Apr 2, 2013 at 7:29 PM, Tim Triche, Jr.
>> >>>>> <tim.triche at gmail.com> wrote:
>> >>>>>> Hi all,
>> >>>>>>
>> >>>>>> Not sure how I managed not to cc: the list on this initially.
>> >>>>>> Here's some GO.db code with a sort of "moral" to it ;-)
>> >>>>>>
>> >>>>>> --t
>> >>>>>>
>> >>>>>> Begin forwarded message:
>> >>>>>>
>> >>>>>> library(IlluminaHumanMethylation450k.db)
>> >>>>>>
>> >>>>>> ## allow both singly- and multiply-mapped probes (e.g. for SYMBOL)
>> >>>>>> IlluminaHumanMethylation450kGOall
>> >>>>>> <-toggleProbes(IlluminaHumanMethylation450kGO, 'all')
>> >>>>>>
>> >>>>>> ## now let's look at the differences that result from
>> >>>>>> toggleProbes()
>> >>>>>> mapped_probes_default <- mappedkeys(IlluminaHumanMethylation450kGO)
>> >>>>>> mapped_probes_toggled <-
>> >>>>>> mappedkeys(IlluminaHumanMethylation450kGOall)
>> >>>>>> multimapped <- setdiff( mapped_probes_toggled,
>> >>>>>> mapped_probes_default )
>> >>>>>>
>> >>>>>> res0 <- mget(head(multimapped), IlluminaHumanMethylation450kGO,
>> >>>>>> ifnotfound=NA)
>> >>>>>> res <- mget(head(multimapped), IlluminaHumanMethylation450kGOall,
>> >>>>>> ifnotfound=NA)
>> >>>>>>
>> >>>>>> ## fetch the GOIDs from the unencumbered toggled map, to get terms
>> >>>>>> for them
>> >>>>>>
>> >>>>>> library(GO.db)
>> >>>>>> GOids <- lapply(res, function(x) unlist(lapply(x, function(y)
>> >>>>>> y$GOID)))
>> >>>>>> GOterms <- lapply(GOids, function(x) mget(x, GOTERM,
>> >>>>>> ifnotfound=NA))
>> >>>>>> head(GOterms)
>> >>>>>>
>> >>>>>>
>> >>>>>>> I'll add this to the docs (next release)
>> >>>>>>>
>> >>>>>>> thanks,
>> >>>>>>>
>> >>>>>>> --t
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Fri, Mar 29, 2013 at 11:24 AM, Fabrice Tourre
>> >>>>>>> <fabrice.ciup at gmail.com> wrote:
>> >>>>>>>> Tim,
>> >>>>>>>>
>> >>>>>>>> Thank you very much for your reply.
>> >>>>>>>> I have a list of probe list.
>> >>>>>>>> Do you a example script for me to get the GO terms, instead of GO
>> >>>>>>>> ID?
>> >>>>>>>>
>> >>>>>>>> The Documentation is not very clear for this.
>> >>>>>>>>
>> >>>>>>>> http://www.bioconductor.org/packages/2.11/data/annotation/html/IlluminaHumanMethylation450k.db.html
>> >>>>>>>>
>> >>>>>>>> On Fri, Mar 29, 2013 at 12:29 PM, Tim Triche, Jr.
>> >>>>>>>> <tim.triche at gmail.com> wrote:
>> >>>>>>>>> Oddly enough, the paper from UCSD with Illumina's folks on it
>> >>>>>>>>> (*) used the
>> >>>>>>>>> IlluminaHumanMethylation450k.db package (which I am currently
>> >>>>>>>>> rebuilding to
>> >>>>>>>>> have a startup message about toggleProbes()) to annotate both
>> >>>>>>>>> CpG islands
>> >>>>>>>>> and GO terms.
>> >>>>>>>>>
>> >>>>>>>>> (*)
>> >>>>>>>>>
>> >>>>>>>>> http://idekerlab.ucsd.edu/publications/Documents/Hannum_MolCell_2012.pdf
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> On Fri, Mar 29, 2013 at 8:49 AM, Fabrice Tourre
>> >>>>>>>>> <fabrice.ciup at gmail.com>
>> >>>>>>>>> wrote:
>> >>>>>>>>>> Dear list,
>> >>>>>>>>>>
>> >>>>>>>>>> In the annotation file of Infinium HumanMethylation450
>> >>>>>>>>>> BeadChip,
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> http://support.illumina.com/documents/MyIllumina/b78d361a-def5-4adb-ab38-e8990625f053/HumanMethylation450_15017482_v.1.2.csv
>> >>>>>>>>>>
>> >>>>>>>>>> for each probe set, they do not have annotation for GO terms,
>> >>>>>>>>>> pathways.
>> >>>>>>>>>>
>> >>>>>>>>>> As they have done in the annotation file:
>> >>>>>>>>>> HG-U133_Plus_2.na32.annot.csv.
>> >>>>>>>>>>
>> >>>>>>>>>> Is there some bioconductor package to annotated the Infinium
>> >>>>>>>>>> HumanMethylation450 probes? Given a probe, feed back the GO
>> >>>>>>>>>> terms and
>> >>>>>>>>>> pathways.
>> >>>>>>>>>>
>> >>>>>>>>>> Thank you very much in advance.
>> >>>>>>>>>>
>> >>>>>>>>>> _______________________________________________
>> >>>>>>>>>> Bioconductor mailing list
>> >>>>>>>>>> Bioconductor at r-project.org
>> >>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >>>>>>>>>> Search the archives:
>> >>>>>>>>>>
>> >>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> A model is a lie that helps you see the truth.
>> >>>>>>>>>
>> >>>>>>>>> Howard Skipper
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> A model is a lie that helps you see the truth.
>> >>>>>>>
>> >>>>>>> Howard Skipper
>> >>>>>>        [[alternative HTML version deleted]]
>> >>>>>>
>> >>>>>> _______________________________________________
>> >>>>>> Bioconductor mailing list
>> >>>>>> Bioconductor at r-project.org
>> >>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> >>>>>> Search the archives:
>> >>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Best wishes,
>> >>>>>
>> >>>>> Jinyan HUANG
>> >>>
>> >>>
>> >>> --
>> >>> Best wishes,
>> >>>
>> >>> Jinyan HUANG
>> >
>>
>>
>>
>> --
>> Best wishes,
>>
>> Jinyan HUANG
>
>
>
>
> --
> A model is a lie that helps you see the truth.
>
> Howard Skipper



-- 
Best wishes,

Jinyan HUANG



More information about the Bioconductor mailing list