[BioC] DESeq on transcripts v/s genes

Abhishek Pratap apratap at lbl.gov
Tue Feb 7 01:25:10 CET 2012


Thanks a lot for the clarifications on a weekend. Sorry I could not
get back earlier.

It seems like what I am trying to do should work out and not introduce
and significant biases.

Cheers!
-Abhi

On Sun, Feb 5, 2012 at 6:59 AM, Wolfgang Huber <whuber at embl.de> wrote:
> A clarification (after off-list request): there are two possibilties for
> double counting, and with below post I'm refering to only one of them:
>
> 1. Creating a transcript-level count for each possible transcript of a gene,
> essentially by *treating each transcript as a separate 'gene'*, and then
> calling DESeq or analgous. This is what the below post refers to.
>
> 2. Counting the reads touching each exon, and then *summing these numbers up
> over all exons of a gene* to get a per-gene (or per transcript) value. That
> would be wrong, since then those reads that touch more than one exon are
> multiply counted and mess up the statistical model.
>
>        Best wishes
>        Wolfgang
>
> Feb/5/12 12:16 PM, Wolfgang Huber scripsit::
>
>> Dear Abishek
>>
>> there was some anxiety regarding double-counting / redundancy in this
>> thread. Actually, there is very little reason to worry. DESeq tests
>> sequentially one hypothesis after the other. It does not matter whether
>> they are correlated or not.
>>
>> The one consideration where the correlations / redundancy can matter is
>> multiple testing correction. As long as you go for FDR, again there is
>> little to worry, since the redundancy pops up both in the numerator and
>> denominator of the ratio (the "R" in FDR) and at least to good enough
>> approximation cancels out.
>>
>> If you go for family-wise error rate (FWER) and, say, Bonferroni
>> correction, then the redundancy and the increase in number of tests do
>> matter. But there seem few reasons to use FWER/Bonferroni in this context.
>>
>> Hope this helps
>> Wolfgang
>>
>> Feb/2/12 12:46 AM, Abhishek Pratap scripsit::
>>>
>>> Hi All
>>>
>>> I am wondering if conceptually I can use the DESeq to test for
>>> differential
>>> transcript expression compared to genes. In our case we have generated a
>>> transcript model based on RNA-Seq and if we try to collapse those
>>> transcripts to genes in order to do gene level differential expression
>>> many
>>> exons are collapsed to give rise to artificial exons.
>>>
>>>
>>> eg :
>>>
>>>
>>> Transcript 1 : ---------------------- (exon)
>>> Transcript 2 : -----------------------------(exon )
>>>
>>> Gene level : -------------------------------------------- (exon)
>>>
>>> Also another thing that comes to my mind if the effect of double counting
>>> if I take the read counts at transcript level due to exon redundancy.
>>>
>>> I would love to hear from your experience.
>>>
>>> Thanks!
>>> -Abhi
>>>
>>> [[alternative HTML version deleted]]
>>>
>>
>> Best wishes
>> Wolfgang
>>
>> Wolfgang Huber
>> EMBL
>> http://www.embl.de/research/units/genome_biology/huber
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
> --
> Best wishes
>        Wolfgang
>
> Wolfgang Huber
> EMBL
> http://www.embl.de/research/units/genome_biology/huber
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list