[BioC] edgeR: design matrix for different condition

Gordon K Smyth smyth at wehi.EDU.AU
Tue Jul 31 01:24:43 CEST 2012


Dear KH Kim,

No, you have not understood me correctly.  I did not suggest that you 
change from edgeR to limma.  I suggested that you read the limma 
documentation because the design matrix is the same for edgeR as it is for 
limma, so the limma documentation would help you create the design matrix 
for your edgeR analysis.

I thought from your last email that you had already done this and that you 
had completed the edgeR analysis satisfactorily.

I wrote more documentation for the edgeR User's Guide a couple of days 
ago, trying to give advice for experiments such as yours.  You might find 
that Section 3.3 of

http://bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

gives more explanation.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Tel: (03) 9345 2326, Fax (03) 9347 0852,
http://www.statsci.org/smyth

On Mon, 30 Jul 2012, KJ Lim wrote:

> Dear Prof Gordon,
>
> Good day.
>
> I have read the section 8.5 of the Limma manual as you suggested in
> previous emails. Thanks.
>
> If I understand you correctly, you would suggest me to carry out my DE
> analysis with limma package if I would like to learn the which genes
> are express in treeHS compare to treeLS (vice versa) at i.e. 24H.
>
> May I ask how can I generate the EList, "eset", in order to fit in the
> linear model as mentioned in the limma manual
>
>  fit <- lmFit(eset, design)
>  fit <- eBayes(fit)
>
> Please correct me if I'm wrong.
>
> Thank you very much for your time and help.
>
> Best regards,
> KJ Lim
>
>
>
> On 27 July 2012 02:31, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>> Dear KJ Lim,
>>
>> Thanks for the rephrasing, which is clearer.  I would have liked you to
>> mention however whether you read the documentation that I refered you do.
>>
>>
>> On Thu, 26 Jul 2012, KJ Lim wrote:
>>
>>> Dear Prof Gordon,
>>>
>>> Good day. Thanks for your prompt replied.
>>>
>>> Please allow me to rephrase my previous question.
>>>
>>> This model,  ~tree+treat,  is assumed effect of the time of treatment
>>> will be the same irrespective of genotype.
>>
>>
>> Yes, this model makes that assumption.
>>
>>
>>> Thus, test for the coef=3 will gives me the differential expression at 96H
>>> irrespective of genotype.
>>
>>
>> Actually coef=3 refers to 24H, according to the design matrix in your
>> original email.  A test for coef=3 would test for DE at 24H vs 0H,
>> irrespective of genotype.  But only if the assumption mentioned above is
>> true, which is unlikely given the rest of your email.
>>
>>
>>> I would like to learn the differential expression of treeHS vs treeLS
>>> at specific time points i.e. 24H or if possible across the whole time
>>> points. Should I fit my model as
>>>
>>>  model A: ~tree*treat    OR   model B: ~tree+tree:treat ?
>>
>>
>> These models are equivalent, so it is just a matter of convenience which one
>> you use, as I tried to explain in the limma documentation I refered you to.
>>
>>
>>> The design matrix columns of model B give:
>>>  "(Intercept)"  "treeLS"  "treeHS:treat03H"  "treeLS:treat03H"
>>> "treeHS:treat24H"  "treeLS:treat24H"  "treeHS:treat96H"
>>> "treeLS:treat96H"
>>>
>>> Am I doing right if I fit the model B and test for the coef=3:4 to
>>> learn the differential expression of treeHS vs treeLS at specific time
>>> points i.e. 03H? Does this test could tells me which set of genes
>>> up/down-regulated in treeLS or treeHS?
>>
>>
>> Yes.  Coef 3 tests for a 3H effect in HS.  Coef 4 tests for a 3H in LS. So
>> testing for both coefficients coef=3:4 tests for a 3H effect in either
>> treeLS or treeHS.
>>
>> Best wishes
>> Gordon
>>
>>
>>> I'm not good in statistic and I'm learning,  kindly please correct me
>>> if I'm wrong.
>>>
>>> Thank you very much for your time and suggestion.
>>>
>>> Best regards,
>>> KJ Lim
>>>
>>>
>>>
>>>
>>> On 26 July 2012 03:39, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>>>
>>>>
>>>> Dear KJ Lim,
>>>>
>>>> I don't quite understand your question, because you seem to be asking for
>>>> something that isn't a test for differential expression, which is what edgeR
>>>> does.  You question "I would like to learn the genes are express in treeHS
>>>> but not treeLS and vice versa" seems to be about absolute expression levels.
>>>> edgeR doesn't test for genes that are not expressed in particular
>>>> conditions.
>>>>
>>>> I'll refer you to the limma section on interaction models in case that
>>>> helps, see Section 8.5 of:
>>>>
>>>>
>>>> http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst/doc/usersguide.pdf
>>>>
>>>> Setting up a design matrix is the same for edgeR as it is for limma.
>>>>
>>>> Best wishes
>>>> Gordon
>>>>
>>>>> Date: Tue, 24 Jul 2012 17:12:00 +0300
>>>>> From: KJ Lim <jinkeanlim at gmail.com>
>>>>> To: Bioconductor mailing list <bioconductor at r-project.org>
>>>>> Subject: [BioC] edgeR: design matrix for different condition
>>>>>
>>>>> Dear the edgeR community,
>>>>>
>>>>> Good day.
>>>>>
>>>>> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2
>>>>> different genotypes (treeHS, treeLS) and time of treatment (0H, 3H,
>>>>> 24H, 96H).
>>>>>
>>>>> I first assumed that the treatment effect will be the same for each
>>>>> genotype and I have the design matrix as:
>>>>>
>>>>>  design <- model.matrix(~tree+treat)
>>>>>
>>>>>> design
>>>>>
>>>>>
>>>>>   (Intercept)  treat03H  treat24H  treat96H  treeLS
>>>>> 1             1        0        0        0      0
>>>>> 2             1        0        0        0      0
>>>>> 3             1        1        0        0      0
>>>>> 4             1        1        0        0      0
>>>>> 5             1        0        1        0      0
>>>>> 6             1        0        1        0      0
>>>>> 7             1        0        0        1      0
>>>>> 8             1        0        0        1      0
>>>>> 9             1        0        0        0      1
>>>>> 10           1        0        0        0      1
>>>>> 11           1        1        0        0      1
>>>>> 12           1        1        0        0      1
>>>>> 13           1        0        1        0      1
>>>>> 14           1        0        1        0      1
>>>>> 15           1        0        0        1      1
>>>>> 16           1        0        0        1      1
>>>>>
>>>>> I used coef=4 to test for the differential expressions between treeHS
>>>>> and treeLS within the time of treatment, coef=3 to learn the
>>>>> differential expressions in 2 genotypes at time of treatment 96H.
>>>>>
>>>>> ** I would like to learn the genes are express in treeHS but not
>>>>> treeLS and vice versa at certain time of treatment let's say 24H or
>>>>> across the whole time of treatment, should I have the design matrix as
>>>>> below or more steps I need to do?
>>>>>
>>>>>  design <- model.matrix(~tree*treat)
>>>>>
>>>>> Kindly please light me on this. I appreciate very much for your help and
>>>>> time.
>>>>>
>>>>> Have a nice day.
>>>>>
>>>>> Best regards,
>>>>> KJ Lim
>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>> ______________________________________________________________________
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list