[BioC] When to treat technical reps as biological reps? WAS:Re: 2x2 factorial loop without common reference (pool)

Wed Apr 26 18:00:45 CEST 2006

Hi everyone,

Comments from Naomi and Gordon (below) about the technical replication in 
the 2x2 factorial loop experiment are very close to an issue I have been 
struggling with for several analyses: When (if ever) is it OK to treat 
technical replicates as biological replicates? Often this is done when 
there is more than one random effect (e.g. also have duplicate spots, 
blocking effects, etc.) because as Gordon has said previously, the between 
gene smoothing of limma cannot currently be done with more than one random 
effect. I know there have been many discussions on this on the list 
previously, but I can see two problems with treating tech reps as 
biological reps, and only one of them has been addressed:

1. There is likely to be artificially decreased variance within treatment 
groups because tech reps should have higher correlations than biological 
reps. This problem has been addressed several times and probably the best 
answer has come from Gordon along the lines of: often measurement error is 
larger than biological variation, so IF there are not higher correlations 
among tech reps then variance estimates should not be artificially decreased.

2. The DF is artificially increased due to psuedoreplication of the 
biological replicates, which leads to artificially lower p-values. This 
combined with even minor changes to the variance components can lead to 
large changes in p-values in my experience.

As far as I know, this second problem has not been addressed. As a case in 
point, in the 2x2 factorial loop from before, each of the three biological 
replicates has 4 technical replicates, and even if there are not higher 
correlations, treating them as biological reps yields N=12 for each group 
instead of N=3. Shouldn't we be worried about this effect as well? In such 
cases when the experiment design really has more than one random effect, 
wouldn't the analysis be better off to model the random effects properly 
with a multilevel model such as lme/nlme rather than get the benefits of 
the empirical Bayes shrinkage either through ignoring technical replication 
or averaging dye swaps?

Thanks,
Jenny

Naomi's comment:
I would use single channel analysis for
this.  The only problem is that Limma allows only
1 level of random effects.  Hence, you will need to average the dye-swaps.

Gordon's comment:
>PS. Although you don't say explicitly, I'm assuming that a1, a2 etc
>represent some sort of biological replication. The above analysis
>does not keep track of which array has which biological replicate of
>each treatment. If you wanted to do a careful job of that, you would
>have no choice but to do a "separate channel" analysis, as Naomi
>Altman has suggested separately. If your biological replicates a1, a2
>etc are not very different, compared to microarray measurement error,
>then the above simpler analysis may be good enough.
>
>Date: Sun, 23 Apr 2006 13:41:22 -0400
> >From: "francois fauteux" <francois.fauteux at gmail.com>
> >Subject: [BioC] 2x2 factorial loop without common reference (pool)
> >To: bioconductor at stat.math.ethz.ch, " Fran?ois fauteux "
> >         <francois.fauteux at gmail.com>, " Richard B?langer "
> >         <richard.belanger at plg.ulaval.ca>
> >Message-ID:
> >         <53328b400604231041v51db3863i8bb48b2fbf725229 at mail.gmail.com>
> >Content-Type: text/plain; charset=ISO-8859-1
> >
> >Hi;
> >
> >We are doing an experiment with agilent 44K (3 biological reps,
> >complete dye-swap):
> >
> >a - control
> >b - treatment 1
> >c - treatment 2
> >d - treatment 1 + treatment 2
> >
> >and I would like to output evidence of the interaction between two
> >treatments and effect on gene expression.
> >
> >24 chips:
> >
> >SlideNumber     Cy3     Cy5
> >1       a1      b1
> >2       a2      b2
> >3       a3      b3
> >4       b1      a1
> >5       b2      a2
> >6       b3      a3
> >7       a1      c1
> >8       a2      c2
> >9       a3      c3
> >10      c1      a1
> >11      c2      a2
> >12      c3      a3
> >13      b1      d1
> >14      b2      d2
> >15      b3      d3
> >16      d1      b1
> >17      d2      b2
> >18      d3      b3
> >19      c1      d1
> >20      c2      d2
> >21      c3      d3
> >22      d1      c1
> >23      d2      c2
> >24      d3      c3
> >
> >I've done several tests with limma to isolate significant results in
> >the following:
> >1- a vs b;
> >2- a vs c;
> >3- b bs d;
> >4- c vs d;
> >
> >with this "targets.txt":
> >
> >SlideNumber     Cy3     Cy5
> >1       a       b
> >2       a       b
> >3       a       b
> >4       b       a
> >5       b       a
> >6       b       a
> >7       a       c
> >8       a       c
> >9       a       c
> >10      c       a
> >11      c       a
> >12      c       a
> >13      b       d
> >14      b       d
> >15      b       d
> >16      d       b
> >17      d       b
> >18      d       b
> >19      c       d
> >20      c       d
> >21      c       d
> >22      d       c
> >23      d       c
> >24      d       c
> >
> >First option:
> >
> > > f <- paste(targets$Cy3, targets$Cy5, sep = ".")
> > > f <- factor(f, levels = c("a.b", "b.a", "a.c", "c.a", "b.d",
> > "d.a", "c.d", "d.a"))
> > > design1 <- model.matrix(~0 + f)
> >
> > > design
> >    a.b b.a a.c c.a b.d d.b c.d d.c
> >1    1   0   0   0    0    0    0    0
> >2    1   0   0   0    0    0    0    0
> >3    1   0   0   0    0    0    0    0
> >4    0   1   0   0    0    0    0    0
> >5    0   1   0   0    0    0    0    0
> >6    0   1   0   0    0    0    0    0
> >7    0   0   1   0    0    0    0    0
> >8    0   0   1   0    0    0    0    0
> >9    0   0   1   0    0    0    0    0
> >10   0   0   0   1    0    0    0    0
> >11   0   0   0   1    0    0    0    0
> >12   0   0   0   1    0    0    0    0
> >13   0   0   0   0    1    0    0    0
> >14   0   0   0   0    1    0    0    0
> >15   0   0   0   0    1    0    0    0
> >16   0   0   0   0    0    1    0    0
> >17   0   0   0   0    0    1    0    0
> >18   0   0   0   0    0    1    0    0
> >19   0   0   0   0    0    0    1    0
> >20   0   0   0   0    0    0    1    0
> >21   0   0   0   0    0    0    1    0
> >22   0   0   0   0    0    0    0    1
> >23   0   0   0   0    0    0    0    1
> >24   0   0   0   0    0    0    0    1
> >
> >This gives significant results for each one of the "levels" but does
> >not take into account the dye-swap (i.e "a.b" and "b.a" are considered
> >independent).
> >
> >Other tested option is:
> > > design2 <- modelMatrix(targets,ref="a")
> >
> > > design
> >       p  s sp
> >ab1   0  1  0
> >ab2   0  1  0
> >ab3   0  1  0
> >ba1   0 -1  0
> >ba2   0 -1  0
> >ba3   0 -1  0
> >ac1   1  0  0
> >ac2   1  0  0
> >ac3   1  0  0
> >ca1  -1  0  0
> >ca2  -1  0  0
> >ca3  -1  0  0
> >bd1  0 -1  1
> >bd2  0 -1  1
> >bd3  0 -1  1
> >db1  0  1 -1
> >db2  0  1 -1
> >db3  0  1 -1
> >cd1 -1  0  1
> >cd2 -1  0  1
> >cd3 -1  0  1
> >dc1  1  0 -1
> >dc2  1  0 -1
> >dc3  1  0 -1
> >
> >This gives results for "b" effect, "c" effect, and "d" effect.
> >However, I could'nt get results for the 4 comparisons of interest
> >(even though the matrix is coherent).
> >
> >Questions:
> >
> >1 - What would be the best option (design and operations) to get to
> >contrasts of interest considering that the experiment has a 4
> >treatments in a factorial design without common reference  (a vs b, a
> >vs c, b vs d, c vs d) and taking into account the dye-effect;
> >
> >2- Is this method (4 contrasts) the best one considering that
> >treatment "d" is a combination of treatments "b" and "c" (factorial
> >type design). How could one directly get to identify genes
> >differentially expressed due to the interaction between treatment "b"
> >and treatment "c" (i.e effect of "d" over "b" and "c").
> >
> >In Limma Users Guide and elsewhere on this forum, I could not find a
> >clear description of how this type of analysis should be performed,
> >even though it is a simple design (i.e 2X2 factorial without a common
> >reference - two color arrays - complete dye swap).
> >
> >Thanks for your time, best regards.
> >
> >Fran?ois Fauteux
> >?tudiant ? la ma?trise en biologie v?g?tale
> >Centre de recherche en horticulture
> >Universit? Laval
> >francois.fauteux at gmail.com
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu