[BioC] GenBank RefSeq conversion

Marc Carlson mcarlson at fhcrc.org
Fri May 30 18:27:36 CEST 2008

Sean Davis wrote:
> On Fri, May 30, 2008 at 8:53 AM, Eleni Christodoulou
> <elenichri at gmail.com> wrote:
>> Hello all!
>> I was trying to convert RefSeq accession numbers to GenBank accesion numbers
>> (or the opposite). I think that there must exist a library that does this
>> job automatically...Does anyone know anything relevant to this?
> Hi, Eleni.  There is no direct relationship between RefSeq and GenBank
> numbers.  A given RefSeq may or may not be represented by exactly one
> GenBank accession.  In fact, a RefSeq may not represent any "real"
> sequence, but can be a composite of several "real" sequences.  As an
> example, see here:
> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NM_007294.2
> It looks like this RefSeq is actually composed of 4 different
> sequences from genbank (if I am reading the record correctly).
> The only way I know to deal with this (at least in the general case)
> is to go through Entrez Gene (or the Ensembl equivalent of a gene) to
> find those accessions in GenBank and RefSeq that share a common Gene
> ID.  You can do this using the annotation package for the organism of
> interest, I think.  Steffen or others might be able to comment on how
> to do this using biomaRt.
> Sean
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

What Sean mentioned should work to at least let you connect the dots.

As an example, for human you could use the package "org.Hs.eg.db" and 
then use the following mappings to get what you want:

1st use "org.Hs.egACCNUM2EG" to get  Entrez Gene IDs for your GenBank 

And then use "org.Hs.egREFSEQ" to get RefSeq IDs for your Entrez Gene IDs.


More information about the Bioconductor mailing list