[BioC] analysis of reference design with even dye-swap across biological replicates

Wed Jun 22 16:51:58 CEST 2011

HI Nadia,

>1-If I stick with the idea of a reference design, but use the fish
>from my control group as one of the treatments, and use a separate
>pooled sample as a reference, I will now be able to hybridize 6 fish
>from each treatment as in the following target file:
>
>1       ref                     control.1
>2       control.2               ref
>3       ref                     control.3
>4       control.4               ref
>5       ref                     control.5
>6       control.6               ref
>7       ref                     treat1.1
>8       treat1.2                ref
>9       ref                     treat1.3
>10      treat1.4                ref
>11      ref                     treat1.5
>12      treat1.6                ref
>13      ref                     treat2.1
>14      treat2.2                ref
>15      ref                     treat2.3
>16      treat2.4                ref
>17      ref                     treat2.5
>18      treat2.6                ref
>19      ref                     treat3.1
>20      treat3.2                ref
>21      ref                     treat3.3
>22      treat3.4                ref
>23      ref                     treat3.5
>24      treat3.6                ref

I still don't recommend doing a reference design 
(see below) but if you do, you don't need to 
indicate individuals here - take off the .1 - .6, 
and just have control, treat1, treat2, and 
treat3. Make the design matrix and do lmFit() with the same code:

 > design <- modelMatrix(targets, ref = "ref")
 > design <- cbind(Dye = 1, design)
 > fit <- lmFit(MA, design)

Now the contrast matrix comparing each treatment to the control is much easier:

 > contrast.matrix <- makeContrasts(treat1 - 
control, treat2 - control, treat3 - control, levels=design)

 > fit2 <- eBayes(contrast.fit(fit,contrast.matrix))

To see the numbers of up- and down-reg genes at the default FDR p < 0.05, do:

 > coded.results <- decideTests(fit2)
 > summary(coded.results)

To get the topTable results for any particular 
pairwise comparison, you have to specify the coefficient, e.g.,:

 > topTable(fit2, coef=1)
         #This will get the treat1 - control; 
BH/fdr is the default adjustment so you don't have to specify it

To get the F-test for a oneway ANOVA of all 4 
groups, call topTable without specifying the 
coef, which by default will combine all 3 together into the appropriate F-test:

 > topTable(fit2)

But, I still think you'd get better power by 
doing a loop design. See Templeman Vet Immunol 
Immunopathol. 2005 May 15;105(3-4):175-86. I 
haven't read it in a while, but loop designs, 
especially inter-connected loops are 
statistically more powerful and efficient than 
common reference designs in most cases (Templeman 
used technical dye-swaps in each calculation, but 
the same results would hold without technical 
replicates in either array design). Do a standard 
loop with 4 replicates in each group, 
dye-flipping so each group has two in each color 
(8 arrays here). This give two direct hybs 
between control and treat1 and control and 
treat3, but not treat2. Then use two more 
replicates in each group to do comparisons 
between control and treat2, and treat1 and treat3 
(4 arrays here, for a total of 12). This is an 
"interconnected" loop. You're still just using 6 
replicates per group, but you'll get more power 
to assess differences between the treatments and 
the control than in the reference design, with 
only 12 arrays. You could even use all 8 
replicates in each group by throwing in one more 
outside loop (4 more arrays); while this would 
result in slightly less power for treat2 v 
control compared with treat1 or treat3 v control, 
it would be a trivial amount and you'd still add 
power to the treat2 v control compared with using 
only 6 reps. Best of all, this has only 16 arrays 
total compared with the reference design's 24, 
AND you have much more power to detect the differences you want!

HTH,
Jenny

>2-You suggest that I do a standard loop design. However, the control
>will not be directly hybridized with all treatments with 24 hybs ( at
>least not with 6 or 8 biological replicates), unless I am not
>understanding how you would do the loop design? I have done loop
>designs before but I wanted comparisons of all groups to each other,
>here I am really interested in the contrast between each treatment and
>the control. Using less arrays and a loop design would be great and I
>am not attached to reference designs per se, but I want to make sure
>that I have the optimal statistical power for the contrast of interest.
>
>Thank you
>
>Nadia
>
>Nadia Aubin-Horth
>Assistant professor
>Biology Department
>Institute of Integrative and Systems Biology
>Room 1241, Charles-Eugène-Marchand Building
>1030, Ave. de la Médecine
>Laval University
>Quebec City (QC) G1V 0A6
>Canada
>
>Phone: 418.656.3316
>Fax: 418.656.7176
>
>web page: http://wikiaubinhorth.ibis.ulaval.ca/Main_Page
>
>
>
>On Jun 20, 2011, at 2:46 PM, Jenny Drnevich wrote:
>
>>Hi Nadia,
>>
>>If the main goal of your experiment is to compare
>>each of the treatments to the control, then DO
>>NOT pool the control samples! Even though you do
>>not care about individual variation, you cannot
>>do an accurate statistical test of the difference
>>of the means with out the estimate of the
>>variance within the controls. Do a standard
>>loop-design and make sure the groups are
>>dye-balanced (4 replicates in each dye); you do
>>not need to do technical dye-swaps to account for
>>the dye effect in the model. This will give 4
>>groups * 8 replicates / 2 channels = 16 arrays.
>>
>>That's my 2-cents,
>>Jenny
>>
>>At 01:31 PM 6/20/2011, Aubin-Horth Nadia wrote:
>>>Hi everybody,
>>>
>>>I am planning to analyse a microarray experiment (Agilent, 2 colors)
>>>and I would like to make sure I can include dye effect with the hyb
>>>design used.
>>>
>>>I have 4 groups: a control group ("wild type") and 3 treatments. We
>>>are interested by the effect of each treatment on gene expression
>>>compared to the control. My plan is to maximize the statistical power
>>>to find differences between the control and each treatment by using a
>>>reference design and having the control in each hyb. Of course, I
>>>loose statistical power to find differences between treatments.
>>>
>>>I have 8 biological replicates (fish) per group available.
>>>
>>>I am interested to know if I can correctly take dye-bias into account
>>>using LIMMA and the following design. I am not interested in
>>>individual gene expression level, only mean and variance for each
>>>treatment.
>>>
>>>The 24 hybs are performed using the control group (all 8 individuals
>>>pooled) as the reference and the 8 individuals from each of the 3
>>>treatments used in only one hyb (no technical replicate). For each
>>>treatment, 4 biological replicates would be labelled in cye 3 and 4
>>>biological replicates would be labelled in cy5 (assigned at random
>>>within treatment). I would thus get an even design in terms of dye
>>>labelling for the reference and the treatments,
>>>but no dye swap/ technical replicate for a
>>>specific fish. The goal is to capture as
>>>much biological variance here (8 fish instead of 4 fish with dye
>>>swap)
>>>for the 24 hybs we can do.
>>>
>>>The target file would look like this (T1, T2 and T3 are treatments
>>>and
>>>the following number represents a biological replicate)
>>>HYB     CY3             Cy5
>>>1               ref             T1.1
>>>2               ref             T1.2
>>>3               ref             T1.3
>>>4               ref             T1.4
>>>5               T1.5            ref
>>>6               T1.6            ref
>>>7               T1.7            ref
>>>8               T1.8            ref
>>>9               ref             T2.1
>>>10              ref             T2.2
>>>11              ref             T2.3
>>>12              ref             T2.4
>>>13              T2.5            ref
>>>14              T2.6            ref
>>>15              T2.7            ref
>>>16              T2.8            ref
>>>17              ref             T3.1
>>>18              ref             T3.2
>>>19              ref             T3.3
>>>20              ref             T3.4
>>>21              T3.5            ref
>>>22              T3.6            ref
>>>23              T3.7            ref
>>>24              T3.8            ref
>>>
>>>The comparison of interest is the average difference between the
>>>control and a given treatment , including dye effects
>>>
>>>I thought I could then use the example as in section 7.3 of limma
>>>user
>>>guide on common reference design but including multiple biological
>>>replicates and a dye effect (from section 8.2)
>>>
>>>Here the contrast matrix is made for treatment 1, T1
>>>
>>>design <- modelMatrix(targets, ref = "ref")
>>>design <- cbind(Dye = 1, design)
>>>fit <- lmFit(MA, design)
>>>cont.matrix <-
>>>makeContrasts((T1.1+T1.2+T1.3+T1.4+T1.5+T1.6+T1.7+T1.8)/ 8, levels
>>>= design)
>>>fit2 <- contrasts.fit(fit, cont.matrix)
>>>fit2 <- eBayes(fit2)
>>>topTable(fit2, adjust = "BH")
>>>
>>>Could someone please tell me if
>>>1) the contrast is appropriate?
>>>2) it will be possible to estimate the dye effect as presented in the
>>>manual with my own hybridization design?
>>>
>>>The hybs have not been performed yet but I assume that one can still
>>>tell if the design is balanced. I could use a loop design as is
>>>normally used in our lab but as I simply want to know what is the
>>>effect of each treatment, I though a reference design was
>>>appropriate,
>>>especially with such a large number of biological replicates.
>>>
>>>Thank you!
>>>
>>>Nadia Aubin-Horth
>>>Assistant professor
>>>Biology Department
>>>Institute of Integrative and Systems Biology
>>>Room 1241, Charles-Eugène-Marchand Building
>>>1030, Ave. de la Médecine
>>>Laval University
>>>Quebec City (QC) G1V 0A6
>>>Canada
>>>
>>>Phone: 418.656.3316
>>>Fax: 418.656.7176
>>>
>>>web page: http://wikiaubinhorth.ibis.ulaval.ca/Main_Page
>>>
>>>_______________________________________________
>>>Bioconductor mailing list
>>>Bioconductor at r-project.org
>>>https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>Search the archives:
>>>http://news.gmane.org/gmane.science.biology.informatics.conductor