[BioC] filtering before using DESeq

Akula, Nirmala (NIH/NIMH) [C] akulan at mail.nih.gov
Sat Dec 15 17:53:54 CET 2012


Hi,

What would be a reasonable/widely used cut-off for overall variance and overall sum?

Thanks for pointing out the number format. The example I gave is from eXpress software and I rounded the numbers to closest integer before I input into DESeq.

Regards,
Nirmala 
________________________________________
From: Wolfgang Huber [whuber at embl.de]
Sent: Saturday, December 15, 2012 11:05 AM
To: Davis, Sean (NIH/NCI) [E]
Cc: Akula, Nirmala (NIH/NIMH) [C]; bioconductor at r-project.org
Subject: Re: [BioC] filtering before using DESeq

Dear Akula, Sean

besides overall variance, overall sum is also a good filter statistic.

Akula, please note that DESeq expects counts, which need to be positive integer values. The values you state are not integers.

        Best wishes
        Wolfgang


Il giorno Dec 14, 2012, alle ore 10:45 PM, Sean Davis <sdavis2 at mail.nih.gov> ha scritto:

> On Fri, Dec 14, 2012 at 2:42 PM, Akula, Nirmala (NIH/NIMH) [C] <
> akulan at mail.nih.gov> wrote:
>
>> Hi,
>>
>> We counted the reads in our RNA-seq data using HT-seq and removed any
>> isoforms that have <5 reads/sample. We then used DESeq for differential
>> expression analysis.
>>
>> Here's an example of a transcript that has the following read counts:
>>
>>
>> GeneA_cases counts:
>> 85.78942
>>
>> 19.11753
>>
>> 1471.813
>>
>> 61.71464
>>
>>
>> GeneA_control counts:
>>
>> 2088.722
>>
>> 2681.746
>>
>> 2413.892
>>
>> 1628.187
>>
>>
>>
>> DESeq p-value for GeneA is 10-4. Do we have to filter out transcripts
>> (that have high variance between samples as shown in the above example)
>> before giving the data to DESeq or will DESeq take this into account while
>> calculating the normalization?
>>
>
> Hi, Nirmala.
>
> If you mean filtering out transcripts that show one or more outliers within
> a given group, then you should ABSOLUTELY NOT do that as this will bias
> your statistical results.  If you mean filtering based on overall variance
> (across groups) to find highly-variable transcripts, that is a different
> story and is acceptable.
>
> Sean
>
>       [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list