[BioC] Possible Bugs in Contrasts Design of edgeR

Gordon K Smyth smyth at wehi.EDU.AU
Tue Apr 30 10:00:06 CEST 2013


Dear Yang,

On Tue, 30 Apr 2013, Yang Liu wrote:

> Dear Prof. Smyth,
>
> Thanks for your reply. Great to hear back from you. I see your points, but
> it still confuses me.
>
> First, thank you for confirming that my interpretation of the intercept is
> correct. To me, it also cancels out in any contrast, as you may find the
> 1st coefficient of my contrast is also zero.
>
> Also, I agree that we are fitting an additive model (patient + treatment)
> within each disease group, that is why there is no interaction between
> patient and treatment.
>
> However, the case is that we are estimating the patient's effect nested in
> each disease as a fixed effect (not a random effect). Then, the patients in
> Healthy group are different from the patients in Disease1 group, hence
> patient's nested effects could not cancel out in the contrast.

You are not fully understanding how a paired analysis works.  Treatments 
are compared only within patients.  The between-patient error strata is 
not used at all in the analysis.

> Generally, based on the design (see the attached figure), if we are 
> trying to find genes that respond differently to the hormone in disease1 
> vs healthy patients, we should average observation 8, 10 and 12 (the 
> average log-fold-change after hormone treatment for patients in disease 
> group 1), then subtract the average of observation 2, 4 and 6 (the 
> average log-fold-change after hormone treatment for healthy patients).

No, this is in incorrect.  You have failed to adjust for the baseline 
expression level for each individual patient when treatment="none".

If this is still unclear, your PhD advisor or professors might be able to 
help.

Best wishes
Gordon



> And it ends up with my contrast showed in my previous thread (i.e. *lrt 
> <- glmLRT(fit, contrast = c(0, 1, 0, -1/3, 1/3, 0, -1/3, 1/3, 0, -1, 1, 
> 0))*)
>
> Also, we can find in the design, the observation 1 (first row in the
> design, healthy patient 1 without treatment) has only coefficient on
> Intercept, which is consistent with the concept that the intercept
> (baseline) represents the combined effect from Healthy, Patient 1, and None
> Treatment. Meanwhile, besides intercept, the observation 2 (second row in
> the design, healthy patient 1 with treatment) has another coefficient on
> "DiseaseHealthy:TreatmentHormone". Hence, we could say that the coefficient
> "DiseaseHealthy:TreatmentHormone" only represents the difference of
> log-fold-change between "with and without hormone treatment" for healthy
> patient 1. We can also check other observations. Especially, for the
> observation 4 (healthy patient 2 with treatment), it has coefficients on
> intercept, "DiseaseHealthy:Patient2", and
> "DiseaseHealthy:TreatmentHormone".
>
> Hence, I still insist that my contrast would be more appropriate. Hope my
> points make sense to you. If not, please feel free to correct me. Thanks
> for your time and helps.
>
> Sincerely,
>
> Yang
>
>
>
> On Mon, Apr 29, 2013 at 4:17 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Yang Liu,
>>
>> I believe that the documentation is correct as it stands.
>>
>> The coefficient "DiseaseHealthy:**TreatmentHormone" represents the
>> average log-fold-change after hormone treatment for healthy patients.  The
>> coefficient "DiseaseDisease1:**TreatmentHormone" represents the average
>> log-fold-change after hormone treatment for patients in disease group 1. So
>> taking the difference of these two coefficients represents the contrast as
>> described in the documentation.
>>
>> The linear model here is equivalent to three paired t-tests, one for each
>> disease group.  In effect we are fitting an additive model
>> (patient+treatment) within each disease group.
>>
>> You are correct in your interpretation of the intercept term, but this is
>> not relevant because it cancels out of any contrast.
>>
>> Best wishes
>> Gordon
>>
>> ---------------- original message ----------------
>>
>> [BioC] Possible Bugs in Contrasts Design of edgeR
>> Yang Liu [guest] guest at bioconductor.org
>> Thu Apr 25 17:59:21 CEST 2013
>>
>> Hi,
>>
>> I am a PhD student from Dept. Statistics of Penn State University. I have
>> been using edgeR for my RNA-seq data recently. As I moved forward with
>> edgeR, I found there are probably some bugs in the latest documentation.
>>
>> My experimental design is exactly the same as the design (Comparisons Both
>> Between and Within Subject) on page 32 in the latest documentation (last
>> revised on 31 March 2013). From page 32 to 34, everything looks fine to me
>> except for the contrasts defined at the end.
>>
>> If I understand it properly, with the procedure showed on those pages, I
>> can estimate each effect in the design, which can be showed with
>> colnames(design). Then, all contrasts should be based on those estimated
>> effects. For the first contrast on page 34, it is trying to find genes that
>> respond differently to the hormone in disease1 vs healthy patients. The
>> contrast looks as follows:
>>
>> lrt <- glmLRT(fit, contrast = c(0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 1, 0))
>>
>> However, it looks not right to me, and it only put weights on the 10th and
>> 11th effects ("-1" and "1"), which are corresponding to "DiseaseHealthy:**TreatmentHormone"
>> and "DiseaseDisease1:**TreatmentHormone". As I see, the intercept
>> (baseline) represents the combined effect from Healthy, Patient 1, and None
>> Treatment. In order to find genes that respond differently to the hormone
>> in disease1 vs healthy patients (regardless which patient it is), you could
>> not leave the weights for all 4th to 9th effects as zeros. Also, as the
>> contrast compares disease1 to healthy, the weight in the contrast for the
>> 2nd effect should not be zero as well. I think the right contrast should be
>> as follows:
>>
>> lrt <- glmLRT(fit, contrast = c(0, 1, 0, -1/3, 1/3, 0, -1/3, 1/3, 0, -1,
>> 1, 0))
>>
>> Please feel free to correct me if I am wrong.
>>
>> Thanks,
>>
>> Yang Liu
>>
>>  -- output of sessionInfo():
>>
>> R version 2.15.2 (2012-10-26)
>> Platform: x86_64-apple-darwin9.8.0/x86_**64 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.**UTF-8/C/en_US.UTF-8/en_US.UTF-**8
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] edgeR_3.0.8  limma_3.14.4
>>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list