[BioC] is database RefSeq achievable from any Bioconductor package

Kasper Daniel Hansen kasperdanielhansen at gmail.com
Tue Jun 1 16:56:45 CEST 2010


On Tue, Jun 1, 2010 at 10:37 AM, Steve Lianoglou
<mailinglist.honeypot at gmail.com> wrote:
> Hi,
>
> On Mon, May 31, 2010 at 8:40 AM,  <mauede at alice.it> wrote:
>> The Biologist we work with has brought my attention to some misalignment between
>>  Ensembl and RefSeq with regard to the length and position of 3UTR sequences.
>
> I'd like to comment on this, but I'm not sure I'd provide any useful
> information w/o more details from you.
>
> But just one point: the RefSeq gene annotations and the ensembl gene
> annotations are not necessarily the same, so what you say here isn't
> all that surprising.
>
> A quick example: the number (and "character") of isoforms per "gene"
> often differ between the two sources.

An additional comment: the definition of UTR and coding region
requires that you know what part of the transcript is actually
translated.  This is well known for the canonical transcript of most
genes in well-annotated organisms.  But it is much less well known for
alternative transcripts from the same gene, even for a well-annotated
organism such as drosophila (this is based on the not-newest version
of Flybase).  Note that this (=defining coding region and UTRs) is
actually surprisingly hard to do computationally (it involves a lot of
guess work).  For more detail on this, for drosophila, you can read
parts of

Hansen KD, Lareau LF, Blanchette M, Green RE, Meng Q, et al. 2009
Genome-Wide Identification of Alternative Splice Forms Down-Regulated
by Nonsense-Mediated mRNA Decay in Drosophila. PLoS Genet 5(6):
e1000525. doi:10.1371/journal.pgen.1000525

http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1000525

Especially the "Reannotating coding regions reveals distinct features
of NMD–target isoforms" subsection of the results.

This proved to essential for the this particular paper.  Fixing up the
mistakes in Flybase made our results interpretable instead of just
looking like noise.

Kasper



More information about the Bioconductor mailing list