[BioC] Gene Ontology Annotations from Gene Names

James W. MacDonald jmacdon at uw.edu
Mon Feb 10 17:44:29 CET 2014


Hi Joseph,

Please don't take conversations off-list.

On Friday, February 07, 2014 9:00:06 PM, Joseph Shaw wrote:
> Hi Jim,
>
> Thanks for all your assistance. I really appreciate it!
>
> Unfortunately, when I attempt to run
>
>> install.packages("org.Cjejuni_0.0.1.tar.gz", repos = NULL, type = "source")
>
> I get the error warning
>
>> Error : package 'AnnotationDbi' 1.24.0 was found, but >= 1.25.2 is required by 'org.Cjejuni.eg.db'
>
> I have since attempted to reinstall and update the AnnotationDbi
> package on my system to a compatible iteration, but the process
> results in the same error.

Hmm. Weird. I seem to have one iteration of a devel AnnotationDbi 
package in my release BioC install.

You could probably just untar and ungzip that file and then manually 
change the DESCRIPTION file to require AnnotationDbi >= 1.24.0 and then 
install using

install.packages("org.Cjejuni.eg.db", type = "source", repos = NULL)


>
> On a separate but related note, is it possible to restrict the list of
> gene annotations from org.Cjejuni.eg.db used in the GO analysis (i.e.
> the GSEAGOHyperGParams()* function) to simply include the probes used
> in the experiment (i.e. create two subsets; a gene universe and a
> collection of genes identified as differentially expressed)?
>
> (*The GSEAGOHyperGParams() function is used in the unuspported model
> organisms vignette, but the author simply uses the entire gene mapping
> as the gene universe and selects the first 500 genes as differentially
> expressed; ideally, I would like to include genes in the universe
> based on gene IDs, but this might not be the most efficient way.)

You are reading the wrong vignette. While this is technically a 
'unsupported organism', since you have an org package, you can just use 
the regular infrastructure:

> univ <- Lkeys(org.Cjejuni.egACCNUM)
> gns <- univ[sample(1:1670, 100)] ## here I am just selecting genes at random
> p <- new("GOHyperGParams", geneIds = gns, universeGeneIds = univ, ontology = "BP", annotation = "org.Cjejuni.eg.db", conditional = TRUE)
> hyp <- hyperGTest(p)
> summary(hyp)
      GOBPID      Pvalue OddsRatio  ExpCount Count Size                 
 Term
1 GO:0012501 0.003677779       Inf 0.1221239     2    2 programmed cell 
death
2 GO:0016265 0.003677779       Inf 0.1221239     2    2                 
death

I get an infinite odds ratio here because I randomly selected the only 
two apoptosis genes on the array. Yay for me!

Best,

Jim


>
> Relevant Vignette:
> http://www.bioconductor.org/packages/devel/bioc/vignettes/GOstats/inst/doc/GOstatsForUnsupportedOrganisms.pdf
>
> Joseph
>
> On Fri, Feb 7, 2014 at 7:03 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>> See attached.
>>
>>
>> On 2/6/2014 8:32 PM, Joseph Shaw wrote:
>>>
>>> Hi Jim,
>>>
>>>> You can check to see if it is a viable option by just giving it a shot.
>>>
>>> I have attempted to call the makeOrgPackageFromNCBI() as described in
>>> your previous mail (having provided my details for the author and
>>> maintainer arguments); however, the function call doesn't fully
>>> complete. In particular, the steps outline below are completed, but it
>>> appears to make it no further.
>>>
>>>> Loading required package: GO.db
>>>>
>>>> Getting data for gene2pubmed.gz
>>>> Loading required package: RCurl
>>>> Loading required package: bitops
>>>> discarding data from other organisms
>>>> Populating gene2pubmed table:
>>>> table gene2pubmed filled
>>>> Getting data for gene2accession.gz
>>>
>>> I'm not sure if the function has failed or if the function is still in
>>> the process of completion. Could you tell me, approximately, how long
>>> the function should take to complete? For reference, I'm currently
>>> running OS X with 1.8 GHz processor and 4GB memory.
>>>
>>> Joseph
>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list