[BioC] Genbank accession annotation?

Fri Oct 4 09:26:51 CEST 2013

Hi Ed

For this particular EST sequence, you find the annotation in UniGene.

http://www.ncbi.nlm.nih.gov/unigene

If you have many such EST sequences, I recommend to download the UniGene 
file:

ftp://ftp.ncbi.nih.gov/repository/UniGene/Homo_sapiens/Hs.data.gz

and do some horrible parsing (with your favorite parsing language).....

For "R28020", you will get:

ID          Hs.369017
TITLE       RAB2A, member RAS oncogene family
GENE        RAB2A
CYTOBAND    8q12.1
GENE_ID     5862
LOCUSLINK   5862

SEQUENCE    ACC=R28020.1; NID=g784155; CLONE=IMAGE:133972; END=3'; 
LID=271; SEQTYPE=EST

Regards, Hans-Rudolf

On 10/03/2013 10:34 PM, James W. MacDonald wrote:
> Hi Ed,
>
> Hypothetically you would want to use the org.Hs.eg.db package. However,
> not all GenBank assession numbers will be annotated, presumably because
> they have been retired. Alternately you could use biomaRt as well.
>
> However, the example ID you give is not annotated by either source.
>
> Best,
>
> Jim
>
>
>
> On Wednesday, October 02, 2013 4:05:51 PM, Ed Siefker wrote:
>> What package would I need to transform Genbank accession numbers into
>> gene symbols or entrez gene ids?  e.g.If I search "R28020" on NCBI, it
>> tells
>> me that "This EST is one of 1366 sequences matched to RAB2A: RAB2A,
>> member RAS oncogene family. "
>>
>> Is there a metadata package that has this kind of information in it? I
>> have a
>> couple hundred such identifiers that I need to map to genes.  I'd like to
>> be able to run
>>
>> getSYMBOL("R28020", "some_annotation_package")
>>
>> and get a useful result.  Any ideas?
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor