[BioC] dye swaps of technical replicates and variable numbers of replicate spots

Wed Aug 20 11:42:30 MEST 2003

At 02:23 AM 20/08/2003, Ramon Diaz-Uriarte wrote:
>Dear all,
>
>I am analyzing some cDNA data; in the simplest case there are a total of 6
>arrays, with three biological replicates; for each biological replicate, the
>arrays are duplicated and arrayed using dye-swap.  Of course, for some genes
>there might be missing values in some of the replicates.
>In addition, some genes are replicated within arrays 5 times, whereas other
>genes are replicated twice (or three times, or four times, or six times), and
>yet others are not replicated at all.
>
>These are the two questions:
>
>1. The limma package includes facilities for handling replicate spots within
>arrays. However, from the help pages and the Bob mutant data example in the
>limma manual, it seems to me that it expects a fairly regular structure.

Yes, that's correct. The regular structure is important. Firstly because 
limma handles within-array replicate spots by estimating the spatial 
correlation between the replicates, and is it only reasonable to assume 
that this correlation is shared by all genes if the replicate structure is 
entirely regular. Secondly because subsequent inference methods for 
assessing differential expression assume that all genes have been treated 
the same and can be treated as having exchangeable standard deviation 
estimators. (I understand that this might not be entirely clear - I am 
writing up the methodology now as a technical report and the manuscript 
will explain the methodology and assumptions more thoroughly.)

So limma is designed to handle within-array replicates arising from robotic 
replication in which multiple spots are printed by making multiple dips of 
the array printer heads into the same wells on the 384-well plates of DNA. 
It is not designed to handle replicate arising from redundancy in the DNA 
library unless this is completely regular.

>I understand that my two options are:
>a) take the easy way out, and compute a mean or a median of the replicates;
>b) "adapt" dupcor.series to my situation to get an estimate of the 
>correlation
>of replicates, and then "adapt" gls.series (or call gls directly);
>
>Is there any other option?

I would not recommend either of the above, at least in conjunction with 
limma. If you take means or medians of spots, and the number of spots being 
averaged differs between genes, then this will invalidate the assumption 
used by ebayes that all residual standard deviations are exchangeable 
(because different genes will be estimated with different precisions). Also 
you can't adapt dupcor.series because dupcor.series is designed to 
estimated a common spatial correlation, and different genes will have 
different between-replicate correlations if they are irregularly spaced.

It might not be ideal, but I would avoid averaging the within-array 
replicates and just treat all spots as corresponding to different genes. 
Then you can be very confident that you have a reliable result if the same 
gene comes up differentially expressed several times (from different 
locations on the array).

>2. The dye-swap set up resembles the swirl example in the limma manual, but
>here the dye swaps are of technical replicates. The first idea that came to
>my mind is to fit (e.g., using the nlme package) a random effects model like:
>
>lme(log.ratio ~ the.interesting.effect, random = ~1|the.biological.replicate)
>
>but since I am only interested in the interesting effect (not the replicate
>variation) I think I can get what I want with limma doing:
>
> > design
>   Efect R1 R2 R3
>1      0  1  0  0
>2      1  1  0  0
>3      0  0  1  0
>4      1  0  1  0
>5      0  0  0  1
>6      1  0  0  1
> > lm.series(data, design)
>
>Does this make sense?

Yes, the design matrix that you propose should work in limma and will give 
you valid results. The random-effects lme approach that you mention above 
though is in principle even better. You could get the best possible results 
by taking output from lme and inputing it in the right way into ebayes. 
(This is the obvious way to handle technical replicates, but I haven't seen 
anyone do it yet.)

Best wishes
Gordon

>  Does it make sense given the mess with the variable
>number of replicates within arrays (question 1)?
>
>
>Thanks,
>
>Ramón
>
>--
>Ramón Díaz-Uriarte
>Bioinformatics Unit
>Centro Nacional de Investigaciones Oncológicas (CNIO)
>(Spanish National Cancer Center)
>Melchor Fernández Almagro, 3
>28029 Madrid (Spain)
>Fax: +-34-91-224-6972
>Phone: +-34-91-224-6900
>
>http://bioinfo.cnio.es/~rdiaz