[BioC] "validity" of p-values

Michael Newton newton at biostat.wisc.edu
Sun Sep 28 17:32:31 MEST 2003


I recommend to all to read Berger and Sellke, 1987, J. Amer. Statist.
Assoc., 82:112-122 on the p-value story and on calculations indicating
how surprised we ought to be given the p-value.

Michael N.

On Sun, 28 Sep 2003, Rafael A. Irizarry wrote:

> remeber p-value means "chance of seeing something as extreme as we saw
> given the null".
> If the null isnt true then the pvalue no longer means what
> we think it means. beware that many ANOVA models make assumptions about
> normality that are hard to defend when studying microarray data. with
> so few arrays we cant rely on the central limit theorem so we are stuck
> hoping the assumptions of normality hold, and they become part of the
> null hypothesis. i think sometimes, we are over optimistice thinking
> the "statistical model is setup correct"
>
> ... and then you have the multiple comparison problem!
>
> -r
>
> On Sun, 28 Sep 2003, Jenny Drnevich wrote:
>
> > See below...
> >
> > >>However, have you seen: Chu, Weir, & Wolfinger.  A systematic
> > >> statistical linear modeling approach to oligonucleotide array
> > >> experiments MATH BIOSCI 176 (1): 35-51 Sp. Iss. SI MAR 2002
> > >>They advocate using the probe-level data in a linear mixed model.
> > >> Assuming that each probe is an independent measure (which I know is not
> > >> true because many of them overlap, but I'm ignoring this for now),
> > >> using probe-level data gives 14-20 "replicates" per chip. We've based
> > >> our analysis methods on this, and with two biological replicates per
> > >> genetic line, and three genetic lines per phenotypic group, we've been
> > >> able to detect as little as a 15% difference in gene expression at
> > >> p=0.0001 (we only expect 2 FP and get 60 genes with p=0.0001).
> > >
> > > Mmmm. Getting very low p-values from just two biological replicates
> > > doesn't  lead you to question the validity of the p-values?? :)
> >
> > But we don't just have two biological replicates. We're interested in
> > consistent gene expression differences between phenotype 1 and phenotype
> > 2. We looked at three different genetic lines showing phenotype 1 and
> > three other lines that had phenotype 2. We made two biological replicates
> > of each line, and the expression level of each gene was estimated by 14
> > probes. By running a mixed-model ANOVA separately for each gene with
> > phenotype, line (nested within phenotype), probe, and all second-order
> > interactions, the phenotype comparison has around 120 df (or so, off the
> > top of my head). That's how we can detect a 15% difference in gene
> > expression. As long as the statistical model is set up correctly, I never
> > "question" the validity of p-values, although I might question the
> > biological significance... :)
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
> >
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>



More information about the Bioconductor mailing list