[BioC] Probe level analysis

Wed May 12 15:24:46 CEST 2004

Hello,

I've done some analysis on a factorial design based on the probe level (affymetrix). The data was background corrected with RMA and normalized via quantiles, but no summarization of the probes into probe sets was done.

This means I've about 16 normalized intensity measures per gene and condition. I include the probe as a factor, the model is simialr to Wolfinger et al. (2002), except that I'm only considering fixed effects (Wolfinger uses the array as a random effect):

Y = B+D+T+P + BP + BD + BT

B = Batch or Laboratory effect (3 levels)
D = Dose (5 levels)
T = Time (2 levels)
P = Probe (~16 levels)

Within each B/T/D I've 2 to 4 replications, and for dose level is missing for one batch and it's time points.

I'm sure there are BD and BT interactions, but the probe may just interact with the batch. I could actually run the full model, but it takes a lot of processing time for 12,000 genes.

This model would actually run for each gene on the chip.

I found the R-squared values are quite good (>0.9), but the residuals are note normal distributed. They've a sort of normal "core", but there are many extreme values seen in a qqnorm plot which curves off quite a lot already near the middle of the plot. Also a sharpiro or ks test shows that the residuals for nearly all genes are not normal.

My question is whether some of you have observed this, too, and what you've done about it ... . Does limma perform any model checking?

I've actually observed a similar non-normality for a 'by-gene' level model (a model considering only the probe set measurement). In addition the by gene level analysis has quite bad R-squared (most genes are around ~0.7).

	kind regards,
	thanks + your comments,

	Arne

--
Arne Muller, Ph.D.
Toxicogenomics, Aventis Pharma
arne dot muller domain=aventis com