[BioC] Question on unbalanced paired design

Mon Apr 8 08:10:16 CEST 2013

Dear List:

We are analyzing Agilent microarray data for a study where samples are related. After Quantile normalization on 'gProcessedSignal', averaging replicate spots and log transformation, we are trying to use LIMMA for differential expression analysis.

Design is as follows-
4 Treatment groups - A, B, C and D
3 Doses per Treatment group, but 4 doses for Treatment A  (Total 13 Treatment-Dose combinations)
There are 8 patient samples in each Treatment-Dose combination (Total 104 samples)

We are interested in comparing Dose effects within Treatments and overlaps across Treatment-Dose combinations. No Treatment comparisons like A vs. B

Patient samples are related within a Treatment group. But they differ from treatment to treatment. So, this is a nested design, but samples are related/paired. These samples are coming from 32 patients.

Out of 104 samples, 12 samples failed in Extraction/Hybridization QC and we are currently analyzing 92 samples. We missed few of the paired samples in each Treatment-Dose group.

Here are the few lines of targets file (attached is full targets file)-

SampleName     Trt          Dose      SibShip
A-01-001              A             1              1
A-03-001              A             3              1
A-04-001              A             4              1
A-01-012              A             1              6
A-02-012              A             2              6
A-04-012              A             4              6
A-01-031              A             1              14
A-02-031              A             2              14
A-03-031              A             3              14
A-04-031              A             4              14
A-01-040              A             1              17
A-02-040              A             2              17
A-03-040              A             3              17
A-04-040              A             4              17
.               .               .               .
.               .               .               .
.               .               .               .
B-01-013              B             1              7
B-02-013              B             2              7
B-03-013              B             3              7
B-01-016              B             1              10
B-02-016              B             2              10
B-03-016              B             3              10
B-01-024              B             1              12
B-02-024              B             2              12
B-03-024              B             3              12
.               .               .               .
.               .               .               .

R-code-
-------------
targets_design = readTargets("targets_design.txt")
> TD <- factor(paste(targets_design$Trt, targets_design$Dose, sep="_"))
> Sibship <- factor(targets_design$SibShip)
> design <- model.matrix(~0+TD+Sibship)
> fit <- lmFit(ldt, design)
Coefficients not estimable: Sibship27 Sibship31 Sibship32
Warning message:
Partial NA coefficients for 34127 probe(s)
> cont.matrix <- makeContrasts(
+ TDA_2 - TDA_1,
+ TDA_3 - TDA_2,
+ TDA_4 - TDA_3,
+ TDB_2 - TDB_1,
+ TDB_3 - TDB_2,
+ levels = design)
> fit1 <- contrasts.fit(fit, cont.matrix)
> fit2 <- eBayes(fit1)
> fit2$coefficients[1:5,]
              Contrasts
               TDA_2 - TDA_1 TDA_3 - TDA_2 TDA_4 - TDA_3 TDB_2 - TDB_1 TDB_3 - TDB_2
  A_23_P146146    -0.2176523    0.14287127   -0.05801898     0.3476315   -0..25312193
  A_23_P42935      0.1718808    0.18653560   -0.20015286    -0.2664990   -0..04537665
  A_23_P117082     0.1545347    0.32006311   -0.16050816     1.0063268   -1..01438229
  A_23_P2683      -0.2549002   -0.16453369    0.27796574     0.2916715   -0..79682996
  A_24_P358131    -0.4647673    0.09824839    0.22298962    -0.4026419    0..53349466

When I run the above code taking patient samples for which we have observations on all treatments, it seems to be correct- because logFC values are matching with my calculations. So, my design matrix is correct ???

But, when I include, all the samples (92), logFC values are not matching, because of unbalanced data and LIMMA doesn't ignore non-paired samples, as discussed in
https://stat.ethz.ch/pipermail/bioconductor/2011-August/040875.html

Should I go ahead with analysis (thinking that design matrix is correct) or is it better to do individual paired t-tests, ignoring data from non-paired samples at each comparison level?

Can you suggest an easy way to explain to non-statisticians that why values are not matching.

Thanks,
Sandhya

________________________________
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions that are unlawful. This e-mail may contain viruses. Ocimum Biosolutions has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment.

The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.

OCIMUMBIO SOLUTIONS (P) LTD
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: targets_design.txt
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20130408/150b318f/attachment.txt>