[BioC] Summarization by gene or exon or transcript

Michael Stadler michael.stadler at fmi.ch
Fri Nov 1 09:32:28 CET 2013


Hi Reema,

If I understand your question correctly, I think the answer is: It depends.

Counting alignments per exon may allow you to pick up differential
splicing or differential isoform usage unrelated to splicing (e.g.
alternative promoter usage or alternative termination).

However, robust estimation of exon levels will require much greater
sequencing depth; assuming that a gene has on average about ten exons,
then you would need about ten times more reads to get a similar
magnitude of counts. If you don't have that data or are not interested
in within-gene structural differences, gene level estimates may be the
better choice.

Of course, you could try out both and compare results. You can easily
get such counts from a bam file using countOverlaps (see workflow at
http://www.bioconductor.org/help/workflows/high-throughput-sequencing/),
or with the QuasR package, getting gene and exon counts is as simple as:

gn <- qCount(proj, txdb, reportLevel="gene")
ex <- qCount(proj, txdb, reportLevel="exon")

Michael



On 31.10.2013 21:19, Steve Lianoglou wrote:
> Hi,
> 
> On Thu, Oct 31, 2013 at 1:04 PM, Reema Singh <reema28sep at gmail.com> wrote:
>> Hi Steve,
>>
>> Thank you for your reply,
>>
>> I just want to known what is the idea feature for summarizing read count
>> after alignment?. Gene,transcript,exons features from GFF/GTF files are
>> frequently used .
> 
> If you are asking what the "ideal" format for storing summarized read
> counts is, I would have to say that in "the R world" that would be to
> use a SummarizedExperiment (it is a class defined in the GenomicRanges
> package).
> 
> The rowData() of the SummarizedExperiment would contain the GRanges
> (or GRangesList) that define where the counts in each row of your
> assay are from, and the columns would tell you the counts for a given
> experiment.
> 
> You could store your relevant sample data in `colData`, ie. phenotypic
> data for each experiment (column), like cell type, perturbation,
> whatever. See ?SummarizedExperiment for more info.
> 
> If you were asking something else -- sorry, I'm still not getting what
> the question is and perhaps someone else can chime in.
> 
> -steve
>



More information about the Bioconductor mailing list