[BioC] Statistics for next-generation sequencing transcriptomics
naomi at stat.psu.edu
Fri Jul 24 13:40:17 CEST 2009
The problem is the FDR vs FNR problem (which we used to call power vs
size). As the total sample size increases, we have the power to
detect tiny, biologically negligible, differences. All frequentist
tests will suffer from this, not just Fisher's exact test.
At 07:22 AM 7/24/2009, michael watson (IAH-C) wrote:
>I'd like to have a discussion about statistics for transcriptomics
>using next-generation sequencing (if there hasn't already been one -
>if there has, then please someone point me to it!)
>What we're seeing in the literature, and here at IAH, are datasets
>where someone has sequenced the transcriptome of two samples using
>something like Illumina. These have been mapped to known sequences
>and counts produced.
>So what we have is something like this:
>geneA: 22000 sequences from 260000 match in sample 1, 43000
>sequences from 507000 in sample 2.
>It's been suggested that one possible approach would be to construct
>2x2 contingency tables and perform Fisher's exact test or the
>Chi-squared test, as has been applied to SAGE data.
>However, I've found that when I do that, the p-values for this type
>of data are incredibly, incredibly small, such that over 90% of my
>data points are significant, even after adjusting for multiple
>testing. I assume/hope that this is because these tests were not
>designed to cope with this type of data.
>For instance, applying Fisher's test to the example above yields a
>p-value of 3.798644e-23.
>As I see it there are three possibilities:
>1) I'm doing something wrong
>2) These tests are totally inappropriate for this type of data
>3) All of my data points are highly significantly different
>I'm thinking that 2 is probably true, though I wouldn't rule out 1.
>Any thoughts and comments are very welcome,
> [[alternative HTML version deleted]]
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>Search the archives:
Naomi S. Altman 814-865-3791 (voice)
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor