[BioC] Gene Ontology Annotations from Gene Names

Wed Feb 5 21:06:47 CET 2014

Hi Joseph,

You can check to see if it is a viable option by just giving it a shot. 
Note that the author and maintainer should in general be you, so you 
would replace my oh so very droll versions with your name and email. 
Also note that if you are on Windows, you need to include type = 
"source" to the call to install.packages().

> makeOrgPackageFromNCBI(version = "0.0.1", author = "me <me at mine.com>", maintainer = "me <me at mine.com>",outputDir = ".", tax_id = "192222", genus = "Campylobacter", species = "jejuni")
Loading required package: GO.db

Getting data for gene2pubmed.gz
Loading required package: RCurl
Loading required package: bitops
discarding data from other organisms
Populating gene2pubmed table:
table gene2pubmed filled
Getting data for gene2accession.gz
discarding data from other organisms
Populating gene2accession table:
table gene2accession filled
Getting data for gene2refseq.gz
discarding data from other organisms
Populating gene2refseq table:
table gene2refseq filled
Getting data for gene2unigene
discarding data from other organisms
Populating gene2unigene table:
table gene2unigene filled
Getting data for gene_info.gz
discarding data from other organisms
Populating gene_info table:
table gene_info filled
Getting data for gene2go.gz
discarding data from other organisms
Populating gene2go table:
Getting blast2GO data as a substitute for gene2go
table metadata filled
table map_metadata filled
table gene2go filled
table metadata filled
table map_metadata filled
Populating genes table:
genes table filled
Populating gene_info_temp table:
gene_info_temp table filled
Populating alias table:
alias table filled
Populating chromosomes table:
chromosomes table filled
Populating pubmed table:
pubmed table filled
Populating refseq table:
refseq table filled
Populating accessions table:
accessions table filled
Populating unigene table:
Dropping GO IDs that are too new for the current GO.db
Dropping GO IDs that are too new for the current GO.db
Dropping GO IDs that are too new for the current GO.db
Populating go_bp table:
go_bp table filled
Populating go_mf table:
go_mf table filled
Populating go_cc table:
go_cc table filled
Populating go_bp_all table:
go_bp_all table filled
Populating go_mf_all table:
go_mf_all table filled
Populating go_cc_all table:
go_cc_all table filled
dropping table gene2pubmeddropping table gene2accessiondropping table 
gene2refseqdropping table gene2unigenedropping table gene_infodropping 
table gene2go
Making GO views

SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE 
t._id=g._id AND t.gene_name NOT NULL
SELECT count(DISTINCT g.gene_id) FROM gene_info AS t, genes as g WHERE 
t._id=g._id AND t.symbol NOT NULL
SELECT count(DISTINCT t.symbol) FROM gene_info AS t, genes as g WHERE 
t._id=g._id AND t.symbol NOT NULL
SELECT count(DISTINCT g.gene_id) FROM chromosomes AS t, genes as g 
WHERE t._id=g._id AND t.chromosome NOT NULL
SELECT count(DISTINCT g.gene_id) FROM refseq AS t, genes as g WHERE 
t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT t.accession) FROM refseq AS t, genes as g WHERE 
t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT g.gene_id) FROM unigene AS t, genes as g WHERE 
t._id=g._id AND t.unigene_id NOT NULL
SELECT count(DISTINCT t.unigene_id) FROM unigene AS t, genes as g WHERE 
t._id=g._id AND t.unigene_id NOT NULL
SELECT count(DISTINCT g.gene_id) FROM accessions AS t, genes as g WHERE 
t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT t.accession) FROM accessions AS t, genes as g 
WHERE t._id=g._id AND t.accession NOT NULL
SELECT count(DISTINCT g.gene_id) FROM alias AS t, genes as g WHERE 
t._id=g._id AND t.alias_symbol NOT NULL
table map_counts filled
Creating package in ./org.Cjejuni.eg.db
[1] TRUE
Warning message:
In .makeSimpleTable(ug, table = "unigene", con) :
  no values found for table unigene in this data chunk.

So that built the package, but now we need to install

> install.packages("org.Cjejuni.eg.db", repos = NULL)
* installing *source* package ‘org.Cjejuni.eg.db’ ...
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (org.Cjejuni.eg.db)
> library(org.Cjejuni.eg.db)

> head(toTable(org.Cjejuni.egGO))
  gene_id      go_id Evidence Ontology
1  904332 GO:0006281      IEA       BP
2  904332 GO:0030420      IEA       BP
3  904333 GO:0006935      IEA       BP
4  904333 GO:0007165      IEA       BP
5  904334 GO:0006401      IEA       BP
6  904335 GO:0006549      IEA       BP

Best,

Jim

On Wednesday, February 05, 2014 1:31:22 PM, Joseph Shaw wrote:
> Hi Jim,
>
> Thanks for your reply!
>
> The organism is Campylobacter jejuni (strain: NCTC11168). How can I
> check if this is a viable option?
>
> According to the reference manual for AnnotationForge, the
> makeOrgPackageFromNCBI() function makes an organism package from
> annotations available from NCBI, but, according to the function
> arguments an author and maintainer are required; I'm not sure exactly
> what this applies to.
>
> Also, the function returns nothing; if this is the case, how can you
> access the created organism package?
>
> Joseph
>
> On Wed, Feb 5, 2014 at 3:51 PM, James W. MacDonald <jmacdon at uw.edu> wrote:
>> Hi Joseph,
>>
>> What's the organism? You might be able to create an org-level package using
>> orgPkgFromNCBI() in the AnnotationForge package.
>>
>> Best,
>>
>> Jim
>>
>>
>>
>> On 2/4/2014 7:09 PM, Joseph Shaw [guest] wrote:
>>>
>>> I am hoping to get appropriate GO mappings for a list of genes used in a
>>> microarray experiment with a view to identifying significantly regulated
>>> processes.
>>>
>>> I was planning on using the Bioconductor package GOstats to identify these
>>> processes; however, the organism under study is not a supported organism. I
>>> have attempted to use the blast2GO software to generate the gene to GO
>>> mapping, but this approach seems to be very time consuming (after generating
>>> the corresponding .fasta files, it took over 1 hour to BLAST just 10 genes).
>>>
>>> Currently, the gene identifiers I am using are simply the gene names, but
>>> it shouldn't be too difficult to derive a list of corresponding alternative
>>> identifiers (assuming they are publicly available) should it be advantageous
>>> to the GO mapping process.
>>>
>>> Is there any faster way to achieve this gene to GO mapping (either through
>>> Bioconductor packages or otherwise)?
>>>
>>> Any assistance is appreciated.
>>>
>>> Joseph
>>>
>>>    -- output of sessionInfo():
>>>
>>> -
>>>
>>> --
>>> Sent via the guest posting facility at bioconductor.org.
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099