[BioC] edgeR: design matrix for different condition

KJ Lim jinkeanlim at gmail.com
Mon Aug 6 06:44:15 CEST 2012


Dear Prof Gordon,

Good day. Thanks for you prompt replied.

Below is the output of cbind(targets,Group=Group)

> cbind(targets,Group=Group)
                  files treat tree  X  Group
1  ./rawData/HS0H01.txt   00H   H1 NA 00H.H1
2  ./rawData/HS0H02.txt   00H   H2 NA 00H.H2
3  ./rawData/HS3H01.txt   03H   H3 NA 03H.H3
4  ./rawData/HS3H02.txt   03H   H4 NA 03H.H4
5  ./rawData/HS1D01.txt   24H   H5 NA 24H.H5
6  ./rawData/HS1D02.txt   24H   H6 NA 24H.H6
7  ./rawData/HS4D01.txt   96H   H7 NA 96H.H7
8  ./rawData/HS4D02.txt   96H   H8 NA 96H.H8
9  ./rawData/LS0H01.txt   00H   L1 NA 00H.L1
10 ./rawData/LS0H02.txt   00H   L2 NA 00H.L2
11 ./rawData/LS3H01.txt   03H   L3 NA 03H.L3
12 ./rawData/LS3H02.txt   03H   L4 NA 03H.L4
13 ./rawData/LS1D01.txt   24H   L5 NA 24H.L5
14 ./rawData/LS1D02.txt   24H   L6 NA 24H.L6
15 ./rawData/LS4D01.txt   96H   L7 NA 96H.L7
16 ./rawData/LS4D02.txt   96H   L8 NA 96H.L8

Thank you for your time.

Best regards,
KJ Lim


On 6 August 2012 07:33, Gordon K Smyth <smyth at wehi.edu.au> wrote:
> The error message is pretty self-explanatory.  Apparently your design matrix
> has as many columns as there are libraries, so there are no degrees of
> freedom left from which to estimate variability.  According to the code and
> output you have given, this message should be impossible, so you must have
> changed the data in some way that I cannot see.
>
> If you had shown the output from cbind(targets,Group=Group), then I might
> have been able to say something useful.
>
> Best wishes
> Gordon
>
>
> On Mon, 6 Aug 2012, KJ Lim wrote:
>
>> Dear Prof Gordon,
>>
>> Good day.
>>
>> Thanks for your time to update the edgeR User's Guide. It is useful.
>>
>> I combined all my experiment factors into one combined factor and
>> described the experiment matrix like the example in the User's guide:
>>
>>  > Group <- factor(paste(targets$treat, targets$tree,sep="."))
>>  > cbind(targets,Group=Group)
>>  > hl.design <- model.matrix(~0+Group)
>>
>> But, I encountered an error when I performed the estimate dispersion
>> for my data
>>
>>  > hl <- estimateGLMCommonDisp(hl, hl.design)
>>  Warning message:
>>  In estimateGLMCommonDisp.default(y = y$counts, design = design,  :
>>    No residual df: setting dispersion to NA
>>
>> I tried to figure out what I have done wrong, unfortunately, I have no
>> luck on that. Could you or the community kindly please light me for
>> this matter?
>>
>> Thank you very much for your time.
>>
>>> sessionInfo()
>>
>> R version 2.15.1 (2012-06-22)
>> Platform: i486-pc-linux-gnu (32-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
>> [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
>> [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
>> [7] LC_PAPER=C                LC_NAME=C
>> [9] LC_ADDRESS=C              LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] edgeR_2.6.10 limma_3.12.1
>>
>> loaded via a namespace (and not attached):
>> [1] tcltk_2.15.1 tools_2.15.1
>>
>> Best regards,
>> KJ Lim
>>
>>
>>
>> On 31 July 2012 02:24, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>>
>>> Dear KH Kim,
>>>
>>> No, you have not understood me correctly.  I did not suggest that you
>>> change
>>> from edgeR to limma.  I suggested that you read the limma documentation
>>> because the design matrix is the same for edgeR as it is for limma, so
>>> the
>>> limma documentation would help you create the design matrix for your
>>> edgeR
>>> analysis.
>>>
>>> I thought from your last email that you had already done this and that
>>> you
>>> had completed the edgeR analysis satisfactorily.
>>>
>>> I wrote more documentation for the edgeR User's Guide a couple of days
>>> ago,
>>> trying to give advice for experiments such as yours.  You might find that
>>> Section 3.3 of
>>>
>>>
>>> http://bioconductor.org/packages/2.11/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
>>>
>>> gives more explanation.
>>>
>>> Best wishes
>>> Gordon
>>>
>>> ---------------------------------------------
>>> Professor Gordon K Smyth,
>>> Bioinformatics Division,
>>> Walter and Eliza Hall Institute of Medical Research,
>>> 1G Royal Parade, Parkville, Vic 3052, Australia.
>>> Tel: (03) 9345 2326, Fax (03) 9347 0852,
>>> http://www.statsci.org/smyth
>>>
>>>
>>> On Mon, 30 Jul 2012, KJ Lim wrote:
>>>
>>>> Dear Prof Gordon,
>>>>
>>>> Good day.
>>>>
>>>> I have read the section 8.5 of the Limma manual as you suggested in
>>>> previous emails. Thanks.
>>>>
>>>> If I understand you correctly, you would suggest me to carry out my DE
>>>> analysis with limma package if I would like to learn the which genes
>>>> are express in treeHS compare to treeLS (vice versa) at i.e. 24H.
>>>>
>>>> May I ask how can I generate the EList, "eset", in order to fit in the
>>>> linear model as mentioned in the limma manual
>>>>
>>>>  fit <- lmFit(eset, design)
>>>>  fit <- eBayes(fit)
>>>>
>>>> Please correct me if I'm wrong.
>>>>
>>>> Thank you very much for your time and help.
>>>>
>>>> Best regards,
>>>> KJ Lim
>>>>
>>>>
>>>>
>>>> On 27 July 2012 02:31, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>>>>
>>>>>
>>>>> Dear KJ Lim,
>>>>>
>>>>> Thanks for the rephrasing, which is clearer.  I would have liked you to
>>>>> mention however whether you read the documentation that I refered you
>>>>> do.
>>>>>
>>>>>
>>>>> On Thu, 26 Jul 2012, KJ Lim wrote:
>>>>>
>>>>>> Dear Prof Gordon,
>>>>>>
>>>>>> Good day. Thanks for your prompt replied.
>>>>>>
>>>>>> Please allow me to rephrase my previous question.
>>>>>>
>>>>>> This model,  ~tree+treat,  is assumed effect of the time of treatment
>>>>>> will be the same irrespective of genotype.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Yes, this model makes that assumption.
>>>>>
>>>>>
>>>>>> Thus, test for the coef=3 will gives me the differential expression at
>>>>>> 96H
>>>>>> irrespective of genotype.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Actually coef=3 refers to 24H, according to the design matrix in your
>>>>> original email.  A test for coef=3 would test for DE at 24H vs 0H,
>>>>> irrespective of genotype.  But only if the assumption mentioned above
>>>>> is
>>>>> true, which is unlikely given the rest of your email.
>>>>>
>>>>>
>>>>>> I would like to learn the differential expression of treeHS vs treeLS
>>>>>> at specific time points i.e. 24H or if possible across the whole time
>>>>>> points. Should I fit my model as
>>>>>>
>>>>>>  model A: ~tree*treat    OR   model B: ~tree+tree:treat ?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> These models are equivalent, so it is just a matter of convenience
>>>>> which
>>>>> one
>>>>> you use, as I tried to explain in the limma documentation I refered you
>>>>> to.
>>>>>
>>>>>
>>>>>> The design matrix columns of model B give:
>>>>>>  "(Intercept)"  "treeLS"  "treeHS:treat03H"  "treeLS:treat03H"
>>>>>> "treeHS:treat24H"  "treeLS:treat24H"  "treeHS:treat96H"
>>>>>> "treeLS:treat96H"
>>>>>>
>>>>>> Am I doing right if I fit the model B and test for the coef=3:4 to
>>>>>> learn the differential expression of treeHS vs treeLS at specific time
>>>>>> points i.e. 03H? Does this test could tells me which set of genes
>>>>>> up/down-regulated in treeLS or treeHS?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Yes.  Coef 3 tests for a 3H effect in HS.  Coef 4 tests for a 3H in LS.
>>>>> So
>>>>> testing for both coefficients coef=3:4 tests for a 3H effect in either
>>>>> treeLS or treeHS.
>>>>>
>>>>> Best wishes
>>>>> Gordon
>>>>>
>>>>>
>>>>>> I'm not good in statistic and I'm learning,  kindly please correct me
>>>>>> if I'm wrong.
>>>>>>
>>>>>> Thank you very much for your time and suggestion.
>>>>>>
>>>>>> Best regards,
>>>>>> KJ Lim
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 26 July 2012 03:39, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Dear KJ Lim,
>>>>>>>
>>>>>>> I don't quite understand your question, because you seem to be asking
>>>>>>> for
>>>>>>> something that isn't a test for differential expression, which is
>>>>>>> what
>>>>>>> edgeR
>>>>>>> does.  You question "I would like to learn the genes are express in
>>>>>>> treeHS
>>>>>>> but not treeLS and vice versa" seems to be about absolute expression
>>>>>>> levels.
>>>>>>> edgeR doesn't test for genes that are not expressed in particular
>>>>>>> conditions.
>>>>>>>
>>>>>>> I'll refer you to the limma section on interaction models in case
>>>>>>> that
>>>>>>> helps, see Section 8.5 of:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://bioconductor.org/packages/2.11/bioc/vignettes/limma/inst/doc/usersguide.pdf
>>>>>>>
>>>>>>> Setting up a design matrix is the same for edgeR as it is for limma.
>>>>>>>
>>>>>>> Best wishes
>>>>>>> Gordon
>>>>>>>
>>>>>>>> Date: Tue, 24 Jul 2012 17:12:00 +0300
>>>>>>>> From: KJ Lim <jinkeanlim at gmail.com>
>>>>>>>> To: Bioconductor mailing list <bioconductor at r-project.org>
>>>>>>>> Subject: [BioC] edgeR: design matrix for different condition
>>>>>>>>
>>>>>>>> Dear the edgeR community,
>>>>>>>>
>>>>>>>> Good day.
>>>>>>>>
>>>>>>>> I'm analyzing my RNA-Seq experiment with edgeR. My study has 2
>>>>>>>> different genotypes (treeHS, treeLS) and time of treatment (0H, 3H,
>>>>>>>> 24H, 96H).
>>>>>>>>
>>>>>>>> I first assumed that the treatment effect will be the same for each
>>>>>>>> genotype and I have the design matrix as:
>>>>>>>>
>>>>>>>>  design <- model.matrix(~tree+treat)
>>>>>>>>
>>>>>>>>> design
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>   (Intercept)  treat03H  treat24H  treat96H  treeLS
>>>>>>>> 1             1        0        0        0      0
>>>>>>>> 2             1        0        0        0      0
>>>>>>>> 3             1        1        0        0      0
>>>>>>>> 4             1        1        0        0      0
>>>>>>>> 5             1        0        1        0      0
>>>>>>>> 6             1        0        1        0      0
>>>>>>>> 7             1        0        0        1      0
>>>>>>>> 8             1        0        0        1      0
>>>>>>>> 9             1        0        0        0      1
>>>>>>>> 10           1        0        0        0      1
>>>>>>>> 11           1        1        0        0      1
>>>>>>>> 12           1        1        0        0      1
>>>>>>>> 13           1        0        1        0      1
>>>>>>>> 14           1        0        1        0      1
>>>>>>>> 15           1        0        0        1      1
>>>>>>>> 16           1        0        0        1      1
>>>>>>>>
>>>>>>>> I used coef=4 to test for the differential expressions between
>>>>>>>> treeHS
>>>>>>>> and treeLS within the time of treatment, coef=3 to learn the
>>>>>>>> differential expressions in 2 genotypes at time of treatment 96H.
>>>>>>>>
>>>>>>>> ** I would like to learn the genes are express in treeHS but not
>>>>>>>> treeLS and vice versa at certain time of treatment let's say 24H or
>>>>>>>> across the whole time of treatment, should I have the design matrix
>>>>>>>> as
>>>>>>>> below or more steps I need to do?
>>>>>>>>
>>>>>>>>  design <- model.matrix(~tree*treat)
>>>>>>>>
>>>>>>>> Kindly please light me on this. I appreciate very much for your help
>>>>>>>> and
>>>>>>>> time.
>>>>>>>>
>>>>>>>> Have a nice day.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> KJ Lim
>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}



More information about the Bioconductor mailing list