[BioC] "validity" of p-values

Justin Borevitz borevitz at salk.edu
Sun Sep 28 20:51:35 MEST 2003

Hi Jenny, your setup has several valid permutations which can be used to
account for your setup and multiple testing.  You can also try and estimate
the proportion of genes different from the null.  The FDR q value might be
of more interest in this case.  See "Statistical significance for
genomewide studies" John D. Storey and Robert Tibshirani

>-----Original Message-----
>From: bioconductor-bounces at stat.math.ethz.ch [mailto:bioconductor-
>bounces at stat.math.ethz.ch] On Behalf Of Jenny Drnevich
>Sent: Sunday, September 28, 2003 1:28 PM
>To: smyth at wehi.edu.au
>Cc: bioconductor at stat.math.ethz.ch
>Subject: [BioC] "validity" of p-values
>See below...
>>>However, have you seen: Chu, Weir, & Wolfinger.  A systematic
>>> statistical linear modeling approach to oligonucleotide array
>>> experiments MATH BIOSCI 176 (1): 35-51 Sp. Iss. SI MAR 2002
>>>They advocate using the probe-level data in a linear mixed model.
>>> Assuming that each probe is an independent measure (which I know is not
>>> true because many of them overlap, but I'm ignoring this for now),
>>> using probe-level data gives 14-20 "replicates" per chip. We've based
>>> our analysis methods on this, and with two biological replicates per
>>> genetic line, and three genetic lines per phenotypic group, we've been
>>> able to detect as little as a 15% difference in gene expression at
>>> p=0.0001 (we only expect 2 FP and get 60 genes with p=0.0001).
>> Mmmm. Getting very low p-values from just two biological replicates
>> doesn't  lead you to question the validity of the p-values?? :)
>But we don't just have two biological replicates. We're interested in
>consistent gene expression differences between phenotype 1 and phenotype
>2. We looked at three different genetic lines showing phenotype 1 and
>three other lines that had phenotype 2. We made two biological replicates
>of each line, and the expression level of each gene was estimated by 14
>probes. By running a mixed-model ANOVA separately for each gene with
>phenotype, line (nested within phenotype), probe, and all second-order
>interactions, the phenotype comparison has around 120 df (or so, off the
>top of my head). That's how we can detect a 15% difference in gene
>expression. As long as the statistical model is set up correctly, I never
>"question" the validity of p-values, although I might question the
>biological significance... :)
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch

More information about the Bioconductor mailing list