[BioC] GO's to gene's

Loren Engrav engrav at u.washington.edu
Mon Mar 1 05:28:17 CET 2010


Ok thank you
I now show
> sessionInfo()
R version 2.10.1 (2009-12-14)
i386-apple-darwin9.8.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] org.Hs.eg.db_2.3.6  GO.db_2.3.5         RSQLite_0.8-3
AnnotationDbi_1.8.1 DBI_0.2-5
[6] Biobase_2.6.1  

loaded via a namespace (and not attached):
[1] tools_2.10.1

And all commands pass with no errors, however I see

> egids
$`GO:0010711`
   IEP 
"1471" 

$`GO:0030199`
    IEA     IEA     ISS     IEA     IMP     IMP     IMP     IMP     NAS
IMP     NAS     IMP     ISS
  "302"   "304"   "538"   "871"  "1277"  "1278"  "1280"  "1281"  "1281"
"1289"  "1289"  "1290"  "1290"
    NAS     IDA     NAS     IEA     IEA     IEA     IEA     IEA     NAS
ISS     IDA     ISS     NAS
 "1301"  "1302"  "1303"  "1805"  "2296"  "2303"  "4010"  "4015"  "4060"
"4763"  "7042"  "7046"  "7373"
    NAS     NAS 
 "9508" "50509" 

$`GO:0030574`
     IEA      IEA      IEA      IEA      IEA      IEA      IEA      IEA
IEA      IEA      IEA
  "4312"   "4313"   "4314"   "4316"   "4317"   "4318"   "4319"   "4320"
"4322"   "4325"   "4327"
     IEA      IDA      IMP      NAS      IEA      NAS      IEA      IEA
IEA      IEA 
  "5184"   "5645"   "5645"   "5653"   "5657"   "9508"   "9509"  "56547"
"64066" "140766" 

$`GO:0032963`
   IEA    IMP 
"3091" "7148" 

$`GO:0032964`
   IEA    IMP    IMP    TAS    IMP
 "871" "1277" "1281" "1281" "1289"

$`GO:0032966`
   IDA     IC 
"3569" "4261" 

$`GO:0032967`
   ISS    IDA    IDA     IC    IMP    TAS    IMP
 "265" "2147" "2149" "3066" "7040" "7040" "7043"

$`GO:0033342`
    IMP 
"23560"

So many GO terms containing the word "collagen" are not listed, like
0004656
0005518
etc
Amigo claims there are 68 such terms and the list above has only 8
What did I do wrong?
Also I would like to omit the IEA group

Thank you






> From: Martin Morgan <mtmorgan at fhcrc.org>
> Date: Sun, 28 Feb 2010 19:30:34 -0800
> To: Loren Engrav <engrav at u.washington.edu>
> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: Re: [BioC] GO's to gene's
> 
> On 02/28/2010 07:17 PM, Loren Engrav wrote:
>> Thank you both
>> Given my skills, it might be easier/quicker to do it "manually" with Amigo
>> But I am trying both methods
>> 
>> For the second method I get
>> 
>>> library(GO.db)
>> Loading required package: AnnotationDbi
>> Loading required package: Biobase
>> 
>> Welcome to Bioconductor
>> 
>>   Vignettes contain introductory material. To view, type
>>   'openVignette()'. To cite Bioconductor, see
>>   'citation("Biobase")' and for packages 'citation(pkgname)'.
>> 
>> Loading required package: DBI
>>> terms <- Term(GOTERM)
>> Error in function (classes, fdef, mtable)  :
>>   unable to find an inherited method for function "Term", for signature
>> "GOTermsAnnDbBimap"
>> 
>>> sessionInfo()
>> R version 2.9.2 Patched (2009-09-05 r49613)
>> i386-apple-darwin9.8.0
>> 
>> locale:
>> en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
> ,
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> Update to R version 2.10 and associated Bioc packages, or for a (much)
> slower solution (you'll want to check that Term and Ontology return ids
> in identical order)
> 
>   terms = eapply(GOTERM, Term)
> 
> etc. I have
> 
>> sessionInfo()
> R version 2.10.1 Patched (2010-02-23 r51168)
> x86_64-unknown-linux-gnu
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] GO.db_2.3.5         RSQLite_0.7-3       DBI_0.2-4
> [4] AnnotationDbi_1.8.1 Biobase_2.6.1
> 
> loaded via a namespace (and not attached):
> [1] tools_2.10.1
> 
> 
> Martin
> 
>> 
>>> From: Martin Morgan <mtmorgan at fhcrc.org>
>>> Date: Sun, 28 Feb 2010 18:42:33 -0800
>>> To: Vincent Carey <stvjc at channing.harvard.edu>
>>> Cc: Loren Engrav <engrav at u.washington.edu>, "bioconductor at stat.math.ethz.ch"
>>> <bioconductor at stat.math.ethz.ch>
>>> Subject: Re: [BioC] GO's to gene's
>>> 
>>> On 02/28/2010 06:14 PM, Vincent Carey wrote:
>>>> Perhaps there is a package with such functionality.  However, with the
>>>> GO.db package in place, you need to do a little
>>>> programming, perhaps along the lines of
>>>> 
>>>> querGO = function(str, attr = "definition", ont = "MF") {
>>>>   require(GO.db, quietly = TRUE)
>>>>   gc = GO_dbconn()
>>>>   quer.1 = paste("select go_id, term from go_term where",
>>>>   attr, "like('%")
>>>>   quer.2 = "%') and ontology = '"
>>>>   quer.3 = "'"
>>>>   quer = paste(quer.1, str, quer.2, ont, quer.3, collapse = "",
>>>>   sep = "")
>>>>   dbGetQuery(gc, quer)
>>>> }
>>>> 
>>>> whereby
>>>> 
>>>>> querGO("collagen", "term")
>>>>        go_id                                                           term
>>>> 1 GO:0004656                     procollagen-proline 4-dioxygenase activity
>>>> 2 GO:0005518                                               collagen binding
>>>> 3 GO:0008475                      procollagen-lysine 5-dioxygenase activity
>>>> 4 GO:0019797                     procollagen-proline 3-dioxygenase activity
>>>> 5 GO:0019798                       procollagen-proline dioxygenase activity
>>>> 6 GO:0033823                       procollagen glucosyltransferase activity
>>>> 7 GO:0042329 structural constituent of collagen and cuticulin-based cuticle
>>>> 8 GO:0050211                     procollagen galactosyltransferase activity
>>>> 9 GO:0070052                                             collagen V binding
>>>>> 
>>> 
>>> Also
>>> 
>>>   library(GO.db)
>>>   terms <- Term(GOTERM)  # or maybe Definition(GOTERM) ?
>>>   ontologies <- Ontology(GOTERM)
>>>   collagen <- terms[grepl("collagen", terms) & ("MF" == ontologies)]
>>> 
>>> and the next step,
>>> 
>>>   library(org.Hs.eg.db)
>>>   egids <- mget(names(collagen), org.Hs.egGO2EG, ifnotfound=NA)
>>>   egids <- egids[!is.na(egids)]
>>> 
>>> 
>>>> 
>>>> On Sun, Feb 28, 2010 at 8:56 PM, Loren Engrav <engrav at u.washington.edu>
>>>> wrote:
>>>>> Is there a BioC package that will find all the GO terms containing some
>>>>> word, like perhaps ³collagen²
>>>>> And then find all the genes contained within those found terms
>>>>> 
>>>>> I scanned
>>>>> GoProfiles
>>>>> GOSemSim
>>>>> GOstats
>>>>> GoTools and
>>>>> TopGO
>>>>> 
>>>>> And could not determine that any would do that.
>>>>> 
>>>>> Thank you.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>        [[alternative HTML version deleted]]
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioconductor mailing list
>>>>> Bioconductor at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>>> Search the archives:
>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> 
>>> 
>>> -- 
>>> Martin Morgan
>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N.
>>> PO Box 19024 Seattle, WA 98109
>>> 
>>> Location: Arnold Building M1 B861
>>> Phone: (206) 667-2793
>> 
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> 
> -- 
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
> 
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793



More information about the Bioconductor mailing list