[BioC] technical replicates (different subset repeated on different print runs within experiment)

Thu Sep 22 08:11:31 CEST 2005

Hi,  There has been quite a bit of discussion on this listserve about
dealing with technical replicates (in sets of arrays where the gene layout
is the same across the experiment).

We have just started analysis of a Stanford published data set where
different print runs have a different set of genes replicated (ie different
layouts).

For example on Chip 1 (treatment) we may have clones A,B,X,Y and on Chip 2
(treatment) we may have A,A,B,X,X, Y and on Chip 3 (control) we may have
A,A,B,X,Y and Chip 4 (control) we may have A, B, X, Y.  For simplicity, I
have only two chips for each condition in this example.

Below I have the 3 alternatives for analysis and the pros and cons as we see
them at the moment.  I would really appreciate any comments anyone may have
on our thinking!

(1) Averaging the technical replicates within an array
Pro's: (a) they are the same DNA sequence on the array so an average is
meaningful
(b) it would be easier for downstream analyses to have a single
representation on a chip, Con's: you will be doing a linear model on
averaged data and non-averaged data on each chip so the spots are measured
with different precision.

(2) Analysing all combinations
eg for clone A: Chip 1 has A1, Chip 2 has A2, A3, Chip 3  has A4, A5, Chip 4
has A6 To compare treatment to control you could compare A1 &A2 vs A4 & A6,
A1 & A3 vs A5 & A6, A1 &A2 vs A5 &A6, A1&A3 vs A4&A6

Pro's: You are using all the data individually instead of averaging.
Con's You can have lots of combinations (depending on the number of
replicates) and the chances of getting a low p value in at least one of the
combinations is increased compared to a gene that has no replicate probes,
so you may be biasing the results

(3) Randomly chose one of the technical replicates to represent a gene on a
chip. ie randomly chose one of the combinations above for the analysis
Pros:  Not going to give some genes a higher chance of a low p values just
by chance.
Cons.  Not using all the data

I was wondering if any one had any thoughts on which of these alternatives
is the best, or whether there is another alternative we haven't considered.

Any ideas would be really appreciated!

Thanks and Regards
Marg

Dr Margaret Gardiner-Garden
Garvan Institute of Medical Research,
Sydney Australia