[BioC] GenBank RefSeq conversion

Sean Davis sdavis2 at mail.nih.gov
Fri May 30 15:09:42 CEST 2008


On Fri, May 30, 2008 at 8:53 AM, Eleni Christodoulou
<elenichri at gmail.com> wrote:
> Hello all!
>
> I was trying to convert RefSeq accession numbers to GenBank accesion numbers
> (or the opposite). I think that there must exist a library that does this
> job automatically...Does anyone know anything relevant to this?

Hi, Eleni.  There is no direct relationship between RefSeq and GenBank
numbers.  A given RefSeq may or may not be represented by exactly one
GenBank accession.  In fact, a RefSeq may not represent any "real"
sequence, but can be a composite of several "real" sequences.  As an
example, see here:

http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=NM_007294.2

It looks like this RefSeq is actually composed of 4 different
sequences from genbank (if I am reading the record correctly).

The only way I know to deal with this (at least in the general case)
is to go through Entrez Gene (or the Ensembl equivalent of a gene) to
find those accessions in GenBank and RefSeq that share a common Gene
ID.  You can do this using the annotation package for the organism of
interest, I think.  Steffen or others might be able to comment on how
to do this using biomaRt.

Sean



More information about the Bioconductor mailing list