[BioC] What populates makeTranscriptDbFromBiomart?

Mon Apr 16 15:41:46 CEST 2012

Hi,

On Sat, Apr 14, 2012 at 4:40 PM, Ravi Karra <ravi.karra at gmail.com> wrote:
> Hi,
>
> Just starting to learn how to look at RNA Seq data, so apologies in advance.  I ran my RNA-Seq experiment on a GAII and aligned to the zebrafish genome using Bowtie2/Tophat2.  I downloaded the current zebrafish genome (Zv9) and transcript gtf file from Ensembl for the reference indices.   I am trying to use edgeR to look at differential expression, but am a little hung up on getting the count data.
>
> As you can see from the code below, I input 8835090 mapped reads, but only 5380643 are overlapped with known transcripts.  It seems that I am losing reads in summarizing the count data and I can't really figure out why.   Is the transcript information that results from makeTranscriptDbFromBiomart identical to the transcript information in the gtf files that can be downloaded via Ensembl?

Assume for the moment that it is identical -- you will (for sure)
still have reads to regions where no transcripts are annotated. This
still happens in organisms with "better" annotations than zebrafish,
such as fruit fly, mouse, and human.

The limit of our knowledge about what, where, and why regions of the
genome are transcribed can be equally exciting as it is frustrating
depending on which side of the fence you happen to be standing on a
particular day.

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact