[BioC] Contrast among a subset of interactions, glm in edgeR

Ryan C. Thompson rct at thompsonclan.org
Wed Aug 6 01:46:52 CEST 2014


Hi, sorry for taking a long time to reply to this, I was quite busy.

I believe that expression "cepa:tiempo:inf" already includes all the 
degrees of freedom for the three variables, their pairwise 
interactions, and the three-way interaction. So including the variables 
individually into the model as well is redundant, and is adding extra 
columns to the design matrix. But I'm not quite sure how model.matrix 
works with suce complex designs, so I can't tell you for sure. But I 
think this is what is happening based on the number of columns in each 
design.

-Ryan

On Wed 18 Jun 2014 02:31:17 AM PDT, Maximo Rivarola wrote:
>
> Hi ryan and everyone!, thanks again for your help,
> I have one more question? One more Error...
> I get an error when i try to estimateGLMcommondispersion,
>
> When i use the design:
> designformula_er <- terms(~ 0 + cepa + tiempo + inf + cepa:tiempo:inf +  linea, keep.order = TRUE)
> design7_er <- model.matrix(designformula_er)
> rownames(design7_er) <- sinfo$names
> colnames(design7_er) <- make.names(colnames(design7_er))
> colnames(design7_er)
> dim(design7_er)
> #[1] 56 30
>
> Colnames look like this:
>> colnames(design7_er)
>   [1] "cepaHA853"                    "cepaHA89"                     "cepaRK416"                    "tiempo4"                      "tiempo8"
>   [6] "infinoc"                      "cepaHA853.tiempo0.infcontrol" "cepaHA89.tiempo0.infcontrol"  "cepaRK416.tiempo0.infcontrol" "cepaHA853.tiempo4.infcontrol"
> [11] "cepaHA89.tiempo4.infcontrol"  "cepaRK416.tiempo4.infcontrol" "cepaHA853.tiempo8.infcontrol" "cepaHA89.tiempo8.infcontrol"  "cepaRK416.tiempo8.infcontrol"
> [16] "cepaHA853.tiempo0.infinoc"    "cepaHA89.tiempo0.infinoc"     "cepaRK416.tiempo0.infinoc"    "cepaHA853.tiempo4.infinoc"    "cepaHA89.tiempo4.infinoc"
> [21] "cepaRK416.tiempo4.infinoc"    "cepaHA853.tiempo8.infinoc"    "cepaHA89.tiempo8.infinoc"     "cepaRK416.tiempo8.infinoc"    "linea2"
> [26] "linea3"                       "linea4"                       "linea5"                       "linea6"                       "linea7"
>>
>
> I checked every column and there is no interaction for which I have all zeros.
>
> Up to there ok, but....... when i try ...
>
> My error is when i try to :
> dge7_er <- estimateGLMCommonDisp(dge1, design7_er, verbose = TRUE)  #To estimate common dispersion
> # Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  :
> #  Design matrix not of full rank.  The following coefficients not estimable:
> # cepaRK416:tiempo8:infcontrol cepaRK416:tiempo0:infinoc cepaRK416:tiempo4:infinoc cepaHA853:tiempo8:infinoc cepaHA89:tiempo8:infinoc cepaRK416:tiempo8:infinoc
>
> Does this mean i am trying to estimate "too" many parameters? I can not estimate the main factors?
>
> When i use the other design: which i estimate all 3 way interaction and batch effect, but NO main factors: This works!!!
> designformula <- terms(~ 0 + cepa:tiempo:inf +  linea, keep.order = TRUE)
> design7 <- model.matrix(designformula)
> dim(design7)
> # [1] 56 24
> dge7 <- estimateGLMCommonDisp(dge1, design7, verbose = TRUE)  #To estimate common dispersion
> #  Disp = 0.03873 , BCV = 0.1968
> All good!
>
> What do you think????
> thanks a million!!!!
> great software!
> cheers,
> maximo
>
> --------------------------------------------------------
> Maximo Rivarola PhD.
>
> ________________________________________
> From: Ryan [rct at thompsonclan.org]
> Sent: Tuesday, June 17, 2014 1:39 PM
> To: Maximo Rivarola
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] Contrast among a subset of interactions, glm in edgeR
>
> Hi Maximo,
>
> I believe that you want to use the following:
>
> designformula <- terms(~ 0 + cepa:tiempo:inf +  linea, keep.order =
> TRUE)
> design7 <- model.matrix(designformula)
>
> This will give you one coefficient for each combination of cepa,
> timepo, and inf. Also, after doing that, you might also want to replace
> the colons in the column names with periods, to make syntacically valid
> names, which will make things easier when you start constructing
> contrasts:
>
> colnames(design7) <- make.names(colnames(design7))
>
> -Ryan
>
> On Tue Jun 17 09:28:17 2014, Maximo Rivarola wrote:
>> Hello everyone,
>>
>> I am writing my first email to this list. I have a question in regards to contrasting a few interactions...
>> I have an RNA-seq experiment with the following structure:
>> - 3 cepa (genotypes)
>> - 3 tiempo (time points)
>> - inf (Inoculated or not)
>>
>> Basically, three different genotypes were infected or not (inf) and then 3 time points were take and sequenced. Most samples have 2 biol replicates and some 4 biol replicates.
>>
>> My question is the following:
>> I would like to make a contrast were I compare the interaction of cepaHA853:tiempo4:infinoc - cepaHA853:tiempo8:infinoc  ????
>> My problem is I do not know where the column for this interaction is???
>> I placed the 0 as for it not to intercept in the model, but the later factors are always shown from level 2 on...
>> How can I "see" all the interaction from my design???
>>
>> Below is some code:
>> Thanks in advanced!!! cheers,
>>
>> Design implemented:
>> design7 <- model.matrix(~ 0 + cepa*tiempo*inf +  linea)
>>
>>> dim(design7)
>> [1] 56 24
>>
>>> head(design7)
>>     cepaHA853 cepaHA89 cepaRK416 tiempo4 tiempo8 infinoc linea2 linea3 linea4 linea5 linea6 linea7 cepaHA89:tiempo4 cepaRK416:tiempo4
>> 1         0        1         0       0       0       1      0      1      0      0      0      0                0                 0
>> 2         0        1         0       0       0       1      0      0      1      0      0      0                0                 0
>> 3         0        1         0       0       0       1      0      0      0      1      0      0                0                 0
>> 4         0        1         0       0       0       1      0      0      0      0      1      0                0                 0
>> 5         1        0         0       0       0       1      0      0      0      0      0      1                0                 0
>> 6         1        0         0       0       0       1      0      0      0      0      0      0                0                 0
>>     cepaHA89:tiempo8 cepaRK416:tiempo8 cepaHA89:infinoc cepaRK416:infinoc tiempo4:infinoc tiempo8:infinoc cepaHA89:tiempo4:infinoc
>> 1                0                 0                1                 0               0               0                        0
>> 2                0                 0                1                 0               0               0                        0
>> 3                0                 0                1                 0               0               0                        0
>> 4                0                 0                1                 0               0               0                        0
>> 5                0                 0                0                 0               0               0                        0
>> 6                0                 0                0                 0               0               0                        0
>>     cepaRK416:tiempo4:infinoc cepaHA89:tiempo8:infinoc cepaRK416:tiempo8:infinoc
>> 1                         0                        0                         0
>> 2                         0                        0                         0
>> 3                         0                        0                         0
>> 4                         0                        0                         0
>> 5                         0                        0                         0
>> 6                         0                        0                         0
>>>
>>
>> dge7 <- estimateGLMCommonDisp(dge1, design7, verbose = TRUE)  #To estimate common dispersion
>> #  Disp = 0.03873 , BCV = 0.1968
>>
>> dge7 <- estimateGLMTagwiseDisp(dge7, design7) #To estimate tagwise dispersions
>> # $tagwise.dispersion
>> # [1] 0.21290421 0.01218267 0.01580605 0.00906055 0.02124947
>>
>> fit7 <- glmFit(dge7, design7, dispersion=dge7$tagwise.dispersion)
>>
>>> class(fit7)
>> [1] "DGEGLM"
>> attr(,"package")
>> [1] "edgeR"
>>
>>> sessionInfo()
>> R version 3.1.0 (2014-04-10)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> locale:
>>    [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=es_ES.UTF-8
>>    [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=es_ES.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] edgeR_3.6.2  limma_3.20.4
>>
>> thanks a lot!
>> hope is explained well....
>>
>> --------------------------------------------------------
>> Maximo Rivarola
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list