[BioC] Fwd: GO terms: Annotation for HumanMethylation450

Jinyan Huang jhuang at hsph.harvard.edu
Wed Apr 3 20:11:12 CEST 2013


> library(GO.db)
Loading required package: AnnotationDbi
Loading required package: BiocGenerics

Attaching package: ‘BiocGenerics’

The following object(s) are masked from ‘package:stats’:

    xtabs

The following object(s) are masked from ‘package:base’:

    anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,
    get, intersect, lapply, Map, mapply, mget, order, paste, pmax,
    pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
    rownames, sapply, setdiff, table, tapply, union, unique

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: DBI

>ids = c( "GO:0008150", "GO:0001869")
> result = select(GO.db, keys =ids, cols=c("DEFINITION","TERM"))
Error in eval(expr, envir, enclos) : object 'GODEFINITION' not found
> sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] GO.db_2.7.1          RSQLite_0.11.2       DBI_0.2-5
[4] AnnotationDbi_1.18.4 Biobase_2.16.0       BiocGenerics_0.2.0

loaded via a namespace (and not attached):
[1] IRanges_1.14.4 stats4_2.15.2

On Wed, Apr 3, 2013 at 2:07 PM, Jinyan Huang <jhuang at hsph.harvard.edu> wrote:
> Marc,
>
> When I update my R to 2.15.2, I still have the error.
>
> R
>
> R version 2.15.2 (2012-10-26) -- "Trick or Treat"
> Copyright (C) 2012 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
>   Natural language support but running in an English locale
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
>> ids = c( "GO:0008150", "GO:0001869")
>> result = select(GO.db, keys =ids, cols=c("DEFINITION","TERM"))
> Error: could not find function "select"
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> On Wed, Apr 3, 2013 at 1:40 PM, Marc Carlson <mcarlson at fhcrc.org> wrote:
>> Hi Jinyan,
>>
>> The code I showed you before will get you all the GO TERMS and their
>> DESCRIPTIONS into a single data frame (without using too much RAM):
>>
>> library(GO.db)
>> k = keys(GOTERM)  ## k is now all the GOIDs that we actually have Terms
>> for.
>> ## If you use another source of GOIDs, you might want to call unique()
>> on that 1st.
>> ## In order to save time.
>> ## Then just call select like I showed you before
>> result = select(GO.db, keys =k, cols=c("DEFINITION","TERM"))
>>
>> ## Then you can use merge() to attach that onto your gene IDs later on.
>>
>> I hope this helps,
>>
>>
>>    Marc
>>
>>
>>
>> On 04/03/2013 08:28 AM, Tim Triche, Jr. wrote:
>>> Probably so. I will look into it. Thanks for the report
>>>
>>> --t
>>>
>>> On Apr 3, 2013, at 8:21 AM, Jinyan Huang <jhuang at hsph.harvard.edu> wrote:
>>>
>>>> Are there any others efficient way to do this? I just thought there
>>>> are some problem in my code.
>>>>
>>>> On Wed, Apr 3, 2013 at 11:14 AM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
>>>>> Buy more RAM :-)
>>>>>
>>>>> --t
>>>>>
>>>>> On Apr 3, 2013, at 6:59 AM, Jinyan Huang <jhuang at hsph.harvard.edu> wrote:
>>>>>
>>>>>> When I want to get all GO terms on IlluminaHumanMethylation450k. There
>>>>>> is a memory problem. It uses more than 10G memory.
>>>>>>
>>>>>> GOids <- lapply(res2, function(x) unlist(lapply(x, function(y) y$GOID)))
>>>>>> GOterms <- lapply(GOids, function(x) mget(x, GOTERM, ifnotfound=NA))
>>>>>> Error: memory exhausted (limit reached?)
>>>>>> Execution halted
>>>>>>
>>>>>>
>>>>>> --------------------------------------Get_all_GO.R----------------------------------------------
>>>>>>
>>>>>> library(IlluminaHumanMethylation450k.db)
>>>>>> ## allow both singly- and multiply-mapped probes (e.g. for SYMBOL)
>>>>>> IlluminaHumanMethylation450kGOall
>>>>>> <-toggleProbes(IlluminaHumanMethylation450kGO,'all')
>>>>>> ## now let's look at the differences that result from toggleProbes()
>>>>>> mapped_probes_toggled <- mappedkeys(IlluminaHumanMethylation450kGOall)
>>>>>> res <- mget(mapped_probes_toggled, IlluminaHumanMethylation450kGOall,
>>>>>> ifnotfound=NA)
>>>>>> res2 <- lapply(res, function(x) x[sapply(x, function(y) y['Evidence']!='IEA')])
>>>>>> ## fetch the GOIDs from the unencumbered toggled map, to get terms for them
>>>>>> library(GO.db)
>>>>>> GOids <- lapply(res2, function(x) unlist(lapply(x, function(y) y$GOID)))
>>>>>> GOterms <- lapply(GOids, function(x) mget(x, GOTERM, ifnotfound=NA))
>>>>>> d<-lapply(GOterms,function(x)do.call(rbind,lapply(x,function(y)data.frame(y at Term,y at GOID,y at Ontology))))
>>>>>> df<-do.call(rbind,d)
>>>>>> len <- sapply(d,function(x)length(x[,1]))
>>>>>> probes <- rep(names(d),len)
>>>>>> df.out<-data.frame(probes=probes,df)
>>>>>> names(df.out)<-c("probe","GoTerm","GOID","GOCategory")
>>>>>> write.table(df.out,"GO_all.txt",quote=F,row.names=F,col.names=T,sep="\t")
>>>>>>
>>>>>> ----------------------------------------------------------------------------------------------------------------
>>>>>>
>>>>>> On Tue, Apr 2, 2013 at 7:29 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Not sure how I managed not to cc: the list on this initially. Here's some GO.db code with a sort of "moral" to it ;-)
>>>>>>>
>>>>>>> --t
>>>>>>>
>>>>>>> Begin forwarded message:
>>>>>>>
>>>>>>> library(IlluminaHumanMethylation450k.db)
>>>>>>>
>>>>>>> ## allow both singly- and multiply-mapped probes (e.g. for SYMBOL) IlluminaHumanMethylation450kGOall <-toggleProbes(IlluminaHumanMethylation450kGO, 'all')
>>>>>>>
>>>>>>> ## now let's look at the differences that result from toggleProbes()
>>>>>>> mapped_probes_default <- mappedkeys(IlluminaHumanMethylation450kGO)
>>>>>>> mapped_probes_toggled <- mappedkeys(IlluminaHumanMethylation450kGOall)
>>>>>>> multimapped <- setdiff( mapped_probes_toggled, mapped_probes_default )
>>>>>>>
>>>>>>> res0 <- mget(head(multimapped), IlluminaHumanMethylation450kGO, ifnotfound=NA)
>>>>>>> res <- mget(head(multimapped), IlluminaHumanMethylation450kGOall, ifnotfound=NA)
>>>>>>>
>>>>>>> ## fetch the GOIDs from the unencumbered toggled map, to get terms for them
>>>>>>>
>>>>>>> library(GO.db)
>>>>>>> GOids <- lapply(res, function(x) unlist(lapply(x, function(y) y$GOID)))
>>>>>>> GOterms <- lapply(GOids, function(x) mget(x, GOTERM, ifnotfound=NA))
>>>>>>> head(GOterms)
>>>>>>>
>>>>>>>
>>>>>>>> I'll add this to the docs (next release)
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> --t
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Mar 29, 2013 at 11:24 AM, Fabrice Tourre <fabrice.ciup at gmail.com> wrote:
>>>>>>>>> Tim,
>>>>>>>>>
>>>>>>>>> Thank you very much for your reply.
>>>>>>>>> I have a list of probe list.
>>>>>>>>> Do you a example script for me to get the GO terms, instead of GO ID?
>>>>>>>>>
>>>>>>>>> The Documentation is not very clear for this.
>>>>>>>>> http://www.bioconductor.org/packages/2.11/data/annotation/html/IlluminaHumanMethylation450k.db.html
>>>>>>>>>
>>>>>>>>> On Fri, Mar 29, 2013 at 12:29 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
>>>>>>>>>> Oddly enough, the paper from UCSD with Illumina's folks on it (*) used the
>>>>>>>>>> IlluminaHumanMethylation450k.db package (which I am currently rebuilding to
>>>>>>>>>> have a startup message about toggleProbes()) to annotate both CpG islands
>>>>>>>>>> and GO terms.
>>>>>>>>>>
>>>>>>>>>> (*)
>>>>>>>>>> http://idekerlab.ucsd.edu/publications/Documents/Hannum_MolCell_2012.pdf
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 29, 2013 at 8:49 AM, Fabrice Tourre <fabrice.ciup at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> Dear list,
>>>>>>>>>>>
>>>>>>>>>>> In the annotation file of Infinium HumanMethylation450 BeadChip,
>>>>>>>>>>>
>>>>>>>>>>> http://support.illumina.com/documents/MyIllumina/b78d361a-def5-4adb-ab38-e8990625f053/HumanMethylation450_15017482_v.1.2.csv
>>>>>>>>>>>
>>>>>>>>>>> for each probe set, they do not have annotation for GO terms, pathways.
>>>>>>>>>>>
>>>>>>>>>>> As they have done in the annotation file: HG-U133_Plus_2.na32.annot.csv.
>>>>>>>>>>>
>>>>>>>>>>> Is there some bioconductor package to annotated the Infinium
>>>>>>>>>>> HumanMethylation450 probes? Given a probe, feed back the GO terms and
>>>>>>>>>>> pathways.
>>>>>>>>>>>
>>>>>>>>>>> Thank you very much in advance.
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioconductor mailing list
>>>>>>>>>>> Bioconductor at r-project.org
>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>>>>>> Search the archives:
>>>>>>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> A model is a lie that helps you see the truth.
>>>>>>>>>>
>>>>>>>>>> Howard Skipper
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> A model is a lie that helps you see the truth.
>>>>>>>>
>>>>>>>> Howard Skipper
>>>>>>>        [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioconductor mailing list
>>>>>>> Bioconductor at r-project.org
>>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best wishes,
>>>>>>
>>>>>> Jinyan HUANG
>>>>
>>>>
>>>> --
>>>> Best wishes,
>>>>
>>>> Jinyan HUANG
>>
>
>
>
> --
> Best wishes,
>
> Jinyan HUANG



-- 
Best wishes,

Jinyan HUANG



More information about the Bioconductor mailing list