[BioC] edgeR: design matrix for different condition

Mon Aug 6 06:16:26 CEST 2012

Dear Prof Gordon,

Good day.

Thanks for your time to update the edgeR User's Guide. It is useful.

I combined all my experiment factors into one combined factor and
described the experiment matrix like the example in the User's guide:

  > Group <- factor(paste(targets$treat, targets$tree,sep="."))
  > cbind(targets,Group=Group)
  > hl.design <- model.matrix(~0+Group)

But, I encountered an error when I performed the estimate dispersion
for my data

  > hl <- estimateGLMCommonDisp(hl, hl.design)
  Warning message:
  In estimateGLMCommonDisp.default(y = y$counts, design = design,  :
    No residual df: setting dispersion to NA

I tried to figure out what I have done wrong, unfortunately, I have no
luck on that. Could you or the community kindly please light me for
this matter?

Thank you very much for your time.

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i486-pc-linux-gnu (32-bit)

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=C                LC_NAME=C
 [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] edgeR_2.6.10 limma_3.12.1

loaded via a namespace (and not attached):
[1] tcltk_2.15.1 tools_2.15.1

Best regards,
KJ Lim

On 31 July 2012 02:24, Gordon K Smyth <smyth at wehi.edu.au> wrote:
> Dear KH Kim,
>
> No, you have not understood me correctly.  I did not suggest that you change
> from edgeR to limma.  I suggested that you read the limma documentation
> because the design matrix is the same for edgeR as it is for limma, so the
> limma documentation would help you create the design matrix for your edgeR
> analysis.
>
> I thought from your last email that you had already done this and that you
> had completed the edgeR analysis satisfactorily.
>
> I wrote more documentation for the edgeR User's Guide a couple of days ago,
> trying to give advice for experiments such as yours.  You might find that
> Section 3.3 of
>
> http://bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
>
> gives more explanation.
>
> Best wishes
> Gordon
>
> ---------------------------------------------
> Professor Gordon K Smyth,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> Tel: (03) 9345 2326, Fax (03) 9347 0852,
> http://www.statsci.org/smyth
>
>
> On Mon, 30 Jul 2012, KJ Lim wrote:
>
>> Dear Prof Gordon,
>>
>> Good day.
>>
>> I have read the section 8.5 of the Limma manual as you suggested in
>> previous emails. Thanks.
>>
>> If I understand you correctly, you would suggest me to carry out my DE
>> analysis with limma package if I would like to learn the which genes
>> are express in treeHS compare to treeLS (vice versa) at i.e. 24H.
>>
>> May I ask how can I generate the EList, "eset", in order to fit in the
>> linear model as mentioned in the limma manual
>>
>>  fit <- lmFit(eset, design)
>>  fit <- eBayes(fit)
>>
>> Please correct me if I'm wrong.
>>
>> Thank you very much for your time and help.
>>
>> Best regards,
>> KJ Lim
>>
>>
>>
>> On 27 July 2012 02:31, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>>
>>> Dear KJ Lim,
>>>
>>> Thanks for the rephrasing, which is clearer.  I would have liked you to
>>> mention however whether you read the documentation that I refered you do.
>>>
>>>
>>> On Thu, 26 Jul 2012, KJ Lim wrote:
>>>
>>>> Dear Prof Gordon,
>>>>
>>>> Good day. Thanks for your prompt replied.
>>>>
>>>> Please allow me to rephrase my previous question.
>>>>
>>>> This model,  ~tree+treat,  is assumed effect of the time of treatment
>>>> will be the same irrespective of genotype.
>>>
>>>
>>>
>>> Yes, this model makes that assumption.
>>>
>>>
>>>> Thus, test for the coef=3 will gives me the differential expression at
>>>> 96H
>>>> irrespective of genotype.
>>>
>>>
>>>
>>> Actually coef=3 refers to 24H, according to the design matrix in your
>>> original email.  A test for coef=3 would test for DE at 24H vs 0H,
>>> irrespective of genotype.  But only if the assumption mentioned above is
>>> true, which is unlikely given the rest of your email.
>>>
>>>
>>>> I would like to learn the differential expression of treeHS vs treeLS
>>>> at specific time points i.e. 24H or if possible across the whole time
>>>> points. Should I fit my model as
>>>>
>>>>  model A: ~tree*treat    OR   model B: ~tree+tree:treat ?
>>>
>>>
>>>
>>> These models are equivalent, so it is just a matter of convenience which
>>> one
>>> you use, as I tried to explain in the limma documentation I refered you
>>> to.
>>>
>>>
>>>> The design matrix columns of model B give:
>>>>  "(Intercept)"  "treeLS"  "treeHS:treat03H"  "treeLS:treat03H"
>>>> "treeHS:treat24H"  "treeLS:treat24H"  "treeHS:treat96H"
>>>> "treeLS:treat96H"
>>>>
>>>> Am I doing right if I fit the model B and test for the coef=3:4 to
>>>> learn the differential expression of treeHS vs treeLS at specific time
>>>> points i.e. 03H? Does this test could tells me which set of genes
>>>> up/down-regulated in treeLS or treeHS?
>>>
>>>
>>>
>>> Yes.  Coef 3 tests for a 3H effect in HS.  Coef 4 tests for a 3H in LS.
>>> So
>>> testing for both coefficients coef=3:4 tests for a 3H effect in either
>>> treeLS or treeHS.
>>>
>>> Best wishes
>>> Gordon
>>>
>>>
>>>> I'm not good in statistic and I'm learning,  kindly please correct me
>>>> if I'm wrong.
>>>>
>>>> Thank you very much for your time and suggestion.
>>>>
>>>> Best regards,
>>>> KJ Lim
>>>>
>>>>
>>>>
>>>>
>>>> On 26 July 2012 03:39, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>>>>
>>>>>
>>>>>
>>>>> Dear KJ Lim,
>>>>>
>>>>> I don't quite understand your question, because you seem to be asking
>>>>> for
>>>>> something that isn't a test for differential expression, which is what
>>>>> edgeR
>>>>> does.  You question "I would like to learn the genes are express in
>>>>> treeHS
>>>>> but not treeLS and vice versa" seems to be about absolute expression
>>>>> levels.
>>>>> edgeR doesn't test for genes that are not expressed in particular
>>>>> conditions.
>>>>>
>>>>> I'll refer you to the limma section on interaction models in case that
>>>>> helps, see Section 8.5 of:
>>>>>
>>>>>
>>>>>
>>>>> http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst/doc/usersguide.pdf
>>>>>
>>>>> Setting up a design matrix is the same for edgeR as it is for limma.
>>>>>
>>>>> Best wishes
>>>>> Gordon
>>>>>
>>>>>> Date: Tue, 24 Jul 2012 17:12:00 +0300
>>>>>> From: KJ Lim <jinkeanlim at gmail.com>
>>>>>> To: Bioconductor mailing list <bioconductor at r-project.org>
>>>>>> Subject: [BioC] edgeR: design matrix for different condition
>>>>>>
>>>>>> Dear the edgeR community,
>>>>>>
>>>>>> Good day.
>>>>>>
>>>>>> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2
>>>>>> different genotypes (treeHS, treeLS) and time of treatment (0H, 3H,
>>>>>> 24H, 96H).
>>>>>>
>>>>>> I first assumed that the treatment effect will be the same for each
>>>>>> genotype and I have the design matrix as:
>>>>>>
>>>>>>  design <- model.matrix(~tree+treat)
>>>>>>
>>>>>>> design
>>>>>>
>>>>>>
>>>>>>
>>>>>>   (Intercept)  treat03H  treat24H  treat96H  treeLS
>>>>>> 1             1        0        0        0      0
>>>>>> 2             1        0        0        0      0
>>>>>> 3             1        1        0        0      0
>>>>>> 4             1        1        0        0      0
>>>>>> 5             1        0        1        0      0
>>>>>> 6             1        0        1        0      0
>>>>>> 7             1        0        0        1      0
>>>>>> 8             1        0        0        1      0
>>>>>> 9             1        0        0        0      1
>>>>>> 10           1        0        0        0      1
>>>>>> 11           1        1        0        0      1
>>>>>> 12           1        1        0        0      1
>>>>>> 13           1        0        1        0      1
>>>>>> 14           1        0        1        0      1
>>>>>> 15           1        0        0        1      1
>>>>>> 16           1        0        0        1      1
>>>>>>
>>>>>> I used coef=4 to test for the differential expressions between treeHS
>>>>>> and treeLS within the time of treatment, coef=3 to learn the
>>>>>> differential expressions in 2 genotypes at time of treatment 96H.
>>>>>>
>>>>>> ** I would like to learn the genes are express in treeHS but not
>>>>>> treeLS and vice versa at certain time of treatment let's say 24H or
>>>>>> across the whole time of treatment, should I have the design matrix as
>>>>>> below or more steps I need to do?
>>>>>>
>>>>>>  design <- model.matrix(~tree*treat)
>>>>>>
>>>>>> Kindly please light me on this. I appreciate very much for your help
>>>>>> and
>>>>>> time.
>>>>>>
>>>>>> Have a nice day.
>>>>>>
>>>>>> Best regards,
>>>>>> KJ Lim
>>>
>>>
>>>
>>> ______________________________________________________________________
>>> The information in this email is confidential and intended solely for the
>>> addressee.
>>> You must not disclose, forward, print or use it without the permission of
>>> the sender.
>>> ______________________________________________________________________
>>
>>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}