[BioC] "validity" of p-values

Sun Sep 28 16:27:50 MEST 2003

See below...

>>However, have you seen: Chu, Weir, & Wolfinger.  A systematic
>> statistical linear modeling approach to oligonucleotide array
>> experiments MATH BIOSCI 176 (1): 35-51 Sp. Iss. SI MAR 2002
>>They advocate using the probe-level data in a linear mixed model.
>> Assuming that each probe is an independent measure (which I know is not
>> true because many of them overlap, but I'm ignoring this for now),
>> using probe-level data gives 14-20 "replicates" per chip. We've based
>> our analysis methods on this, and with two biological replicates per
>> genetic line, and three genetic lines per phenotypic group, we've been
>> able to detect as little as a 15% difference in gene expression at
>> p=0.0001 (we only expect 2 FP and get 60 genes with p=0.0001).
>
> Mmmm. Getting very low p-values from just two biological replicates
> doesn't  lead you to question the validity of the p-values?? :)

But we don't just have two biological replicates. We're interested in
consistent gene expression differences between phenotype 1 and phenotype
2. We looked at three different genetic lines showing phenotype 1 and
three other lines that had phenotype 2. We made two biological replicates
of each line, and the expression level of each gene was estimated by 14
probes. By running a mixed-model ANOVA separately for each gene with
phenotype, line (nested within phenotype), probe, and all second-order
interactions, the phenotype comparison has around 120 df (or so, off the
top of my head). That's how we can detect a 15% difference in gene
expression. As long as the statistical model is set up correctly, I never
"question" the validity of p-values, although I might question the
biological significance... :)