# [BioC] duplicateCorrelation and design matrix

Gordon K Smyth smyth at wehi.EDU.AU
Fri Jul 1 14:15:21 CEST 2005

```> Date: Thu, 30 Jun 2005 11:44:02 +0000
> From: Carolyn Fitzsimmons <Carolyn.Fitzsimmons at imbim.uu.se>
> Subject: [BioC] duplicateCorrelation and design matrix
> To: Bioconductor list <bioconductor at stat.math.ethz.ch>
>
> Hello,
>
> I need an explanation of how the design matrix influences the consensus
> correlation of the duplicateCorrelation function when accounting for technical
> replicates.  Here is my specific example:
>
> Design matrix:
>> design
>    RJf RJm WLf WLm
> 1    0   0   0   1
> 2    0   0   0   1
> 3    0   0   0   1
> 4    0   0   0   1
> 5    0   0   0   1
> 6    0   0   0   1
> 7    0   0   0   1
> 8    0   0   0   1
> 9    0   0   1   0
> 10   0   0   1   0
> 11   0   0   1   0
> 12   0   0   1   0
> 13   0   0   1   0
> 14   0   0   1   0
> 15   0   0   1   0
> 16   0   0   1   0
> 17   0   1   0   0
> 18   0   1   0   0
> 19   0   1   0   0
> 20   0   1   0   0
> 21   0   1   0   0
> 22   0   1   0   0
> 23   0   1   0   0
> 24   0   1   0   0
> 25   1   0   0   0
> 26   1   0   0   0
> 27   1   0   0   0
> 28   1   0   0   0
> 29   1   0   0   0
> 30   1   0   0   0
> 31   1   0   0   0
> 32   1   0   0   0
> #
> each second slide is a replicate of the first (eg. 1 and 2 are replicates, then
> 3 and 4,... etc.).  There are also 4 groups that I want to compare, with 4
> individuals in each group (each duplicated).  So I continue with the
> duplicateCorrelation:
> #
>> cor <- duplicateCorrelation(Mmatrix_ny, design=design,
> +
> block=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,16,16))
>> cor\$cor
> [1] -0.03060575
> #
> which is a pretty bad correlation so I probably should just use the technical
> replicates as biological replicates (the limma user guide says).  But in
> another comparison I want to put all the arrays in 2 groups, see design
> matrix:
>> designWLRJ
>    RJ WL
> 1   0  1
> 2   0  1
> 3   0  1
> 4   0  1
> 5   0  1
> 6   0  1
> 7   0  1
> 8   0  1
> 9   0  1
> 10  0  1
> 11  0  1
> 12  0  1
> 13  0  1
> 14  0  1
> 15  0  1
> 16  0  1
> 17  1  0
> 18  1  0
> 19  1  0
> 20  1  0
> 21  1  0
> 22  1  0
> 23  1  0
> 24  1  0
> 25  1  0
> 26  1  0
> 27  1  0
> 28  1  0
> 29  1  0
> 30  1  0
> 31  1  0
> 32  1  0
> #
> and then do the duplicateCorrelation function and get a different correlation.
> #
>> corWLRJ <- duplicateCorrelation (Mmatrix_ny, design=designWLRJ,
> +
> block=c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,16,16))
>> corWLRJ\$cor
> [1] 0.01745252
> #
> Moreover when I compute the consensus correlation without using a design matrix
> I get 0.1073055.  I know from looking through previous posts and a lot of help
> from Johan L. that the way the blocking is set up and using the design matrix
> in these situations is correct.

You've used three different non-equivalent design matrices.  No more than one of these can be
correct.

> So how is the consensus correlation actually
> being calculated in the above situations? (in loose mathamatical terms if
> possible, as you can probably tell from my question).

In loose terms the correlation measures the variability between blocks relative to the variation
within blocks.  Over-simplifying the design matrix will increase the between-blocks variation,
because it will now reflect differences between your treatments as well as differences between
biological replicates.  Hence the estimated correlation increases.

Gordon

> Thanks a lot for your time,  Carolyn
>
> --
> Carolyn Fitzsimmons
> Dept. Medical Biochemistry and Microbiology
> Uppsala University
> Box 597/BMC
> SE-751 24
> SWEDEN
>
> E-mail: Carolyn.Fitzsimmons at imbim.uu.se
> Tel: +46 (0)18 471 4593
> Mobile: +46 (0)73 704 1248

```