[BioC] Possible Bugs in Contrasts Design of edgeR

Yang Liu zjubioly at gmail.com
Tue Apr 30 07:01:35 CEST 2013


Dear Prof. Smyth,

Thanks for your reply. Great to hear back from you. I see your points, but
it still confuses me.

First, thank you for confirming that my interpretation of the intercept is
correct. To me, it also cancels out in any contrast, as you may find the
1st coefficient of my contrast is also zero.

Also, I agree that we are fitting an additive model (patient + treatment)
within each disease group, that is why there is no interaction between
patient and treatment.

However, the case is that we are estimating the patient's effect nested in
each disease as a fixed effect (not a random effect). Then, the patients in
Healthy group are different from the patients in Disease1 group, hence
patient's nested effects could not cancel out in the contrast.

Generally, based on the design (see the attached figure), if we are trying
to find genes that respond differently to the hormone in disease1 vs
healthy patients, we should average observation 8, 10 and 12 (the average
log-fold-change after hormone treatment for patients in disease group 1),
then subtract the average of observation 2, 4 and 6 (the average
log-fold-change after hormone treatment for healthy patients). And it ends
up with my contrast showed in my previous thread (i.e. *lrt <- glmLRT(fit,
contrast = c(0, 1, 0, -1/3, 1/3, 0, -1/3, 1/3, 0, -1, 1, 0))*)

Also, we can find in the design, the observation 1 (first row in the
design, healthy patient 1 without treatment) has only coefficient on
Intercept, which is consistent with the concept that the intercept
(baseline) represents the combined effect from Healthy, Patient 1, and None
Treatment. Meanwhile, besides intercept, the observation 2 (second row in
the design, healthy patient 1 with treatment) has another coefficient on
"DiseaseHealthy:TreatmentHormone". Hence, we could say that the coefficient
"DiseaseHealthy:TreatmentHormone" only represents the difference of
log-fold-change between "with and without hormone treatment" for healthy
patient 1. We can also check other observations. Especially, for the
observation 4 (healthy patient 2 with treatment), it has coefficients on
intercept, "DiseaseHealthy:Patient2", and
"DiseaseHealthy:TreatmentHormone".

Hence, I still insist that my contrast would be more appropriate. Hope my
points make sense to you. If not, please feel free to correct me. Thanks
for your time and helps.

Sincerely,

Yang



On Mon, Apr 29, 2013 at 4:17 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:

> Dear Yang Liu,
>
> I believe that the documentation is correct as it stands.
>
> The coefficient "DiseaseHealthy:**TreatmentHormone" represents the
> average log-fold-change after hormone treatment for healthy patients.  The
> coefficient "DiseaseDisease1:**TreatmentHormone" represents the average
> log-fold-change after hormone treatment for patients in disease group 1. So
> taking the difference of these two coefficients represents the contrast as
> described in the documentation.
>
> The linear model here is equivalent to three paired t-tests, one for each
> disease group.  In effect we are fitting an additive model
> (patient+treatment) within each disease group.
>
> You are correct in your interpretation of the intercept term, but this is
> not relevant because it cancels out of any contrast.
>
> Best wishes
> Gordon
>
> ---------------- original message ----------------
>
> [BioC] Possible Bugs in Contrasts Design of edgeR
> Yang Liu [guest] guest at bioconductor.org
> Thu Apr 25 17:59:21 CEST 2013
>
> Hi,
>
> I am a PhD student from Dept. Statistics of Penn State University. I have
> been using edgeR for my RNA-seq data recently. As I moved forward with
> edgeR, I found there are probably some bugs in the latest documentation.
>
> My experimental design is exactly the same as the design (Comparisons Both
> Between and Within Subject) on page 32 in the latest documentation (last
> revised on 31 March 2013). From page 32 to 34, everything looks fine to me
> except for the contrasts defined at the end.
>
> If I understand it properly, with the procedure showed on those pages, I
> can estimate each effect in the design, which can be showed with
> colnames(design). Then, all contrasts should be based on those estimated
> effects. For the first contrast on page 34, it is trying to find genes that
> respond differently to the hormone in disease1 vs healthy patients. The
> contrast looks as follows:
>
> lrt <- glmLRT(fit, contrast = c(0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 1, 0))
>
> However, it looks not right to me, and it only put weights on the 10th and
> 11th effects ("-1" and "1"), which are corresponding to "DiseaseHealthy:**TreatmentHormone"
> and "DiseaseDisease1:**TreatmentHormone". As I see, the intercept
> (baseline) represents the combined effect from Healthy, Patient 1, and None
> Treatment. In order to find genes that respond differently to the hormone
> in disease1 vs healthy patients (regardless which patient it is), you could
> not leave the weights for all 4th to 9th effects as zeros. Also, as the
> contrast compares disease1 to healthy, the weight in the contrast for the
> 2nd effect should not be zero as well. I think the right contrast should be
> as follows:
>
> lrt <- glmLRT(fit, contrast = c(0, 1, 0, -1/3, 1/3, 0, -1/3, 1/3, 0, -1,
> 1, 0))
>
> Please feel free to correct me if I am wrong.
>
> Thanks,
>
> Yang Liu
>
>  -- output of sessionInfo():
>
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-apple-darwin9.8.0/x86_**64 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.**UTF-8/C/en_US.UTF-8/en_US.UTF-**8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] edgeR_3.0.8  limma_3.14.4
>
> --
>
> ______________________________**______________________________**__________
> The information in this email is confidential and intended solely for the
> addressee.
> You must not disclose, forward, print or use it without the permission of
> the sender.
> ______________________________**______________________________**__________
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: design.png
Type: image/png
Size: 36262 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20130430/4bfc9f02/attachment.png>


More information about the Bioconductor mailing list