[BioC] Contrast among a subset of interactions, glm in edgeR

Maximo Rivarola rivarola.maximo at inta.gob.ar
Wed Jun 18 11:31:17 CEST 2014


Hi ryan and everyone!, thanks again for your help,
I have one more question? One more Error...
I get an error when i try to estimateGLMcommondispersion,

When i use the design:
designformula_er <- terms(~ 0 + cepa + tiempo + inf + cepa:tiempo:inf +  linea, keep.order = TRUE)
design7_er <- model.matrix(designformula_er)
rownames(design7_er) <- sinfo$names
colnames(design7_er) <- make.names(colnames(design7_er))
colnames(design7_er)
dim(design7_er)
#[1] 56 30

Colnames look like this:
> colnames(design7_er)
 [1] "cepaHA853"                    "cepaHA89"                     "cepaRK416"                    "tiempo4"                      "tiempo8"                     
 [6] "infinoc"                      "cepaHA853.tiempo0.infcontrol" "cepaHA89.tiempo0.infcontrol"  "cepaRK416.tiempo0.infcontrol" "cepaHA853.tiempo4.infcontrol"
[11] "cepaHA89.tiempo4.infcontrol"  "cepaRK416.tiempo4.infcontrol" "cepaHA853.tiempo8.infcontrol" "cepaHA89.tiempo8.infcontrol"  "cepaRK416.tiempo8.infcontrol"
[16] "cepaHA853.tiempo0.infinoc"    "cepaHA89.tiempo0.infinoc"     "cepaRK416.tiempo0.infinoc"    "cepaHA853.tiempo4.infinoc"    "cepaHA89.tiempo4.infinoc"    
[21] "cepaRK416.tiempo4.infinoc"    "cepaHA853.tiempo8.infinoc"    "cepaHA89.tiempo8.infinoc"     "cepaRK416.tiempo8.infinoc"    "linea2"                      
[26] "linea3"                       "linea4"                       "linea5"                       "linea6"                       "linea7"                              
> 

I checked every column and there is no interaction for which I have all zeros.

Up to there ok, but....... when i try ...

My error is when i try to :
dge7_er <- estimateGLMCommonDisp(dge1, design7_er, verbose = TRUE)  #To estimate common dispersion
# Error in glmFit.default(y, design = design, dispersion = dispersion, offset = offset,  : 
#  Design matrix not of full rank.  The following coefficients not estimable:
# cepaRK416:tiempo8:infcontrol cepaRK416:tiempo0:infinoc cepaRK416:tiempo4:infinoc cepaHA853:tiempo8:infinoc cepaHA89:tiempo8:infinoc cepaRK416:tiempo8:infinoc

Does this mean i am trying to estimate "too" many parameters? I can not estimate the main factors?

When i use the other design: which i estimate all 3 way interaction and batch effect, but NO main factors: This works!!!
designformula <- terms(~ 0 + cepa:tiempo:inf +  linea, keep.order = TRUE)
design7 <- model.matrix(designformula)
dim(design7)
# [1] 56 24
dge7 <- estimateGLMCommonDisp(dge1, design7, verbose = TRUE)  #To estimate common dispersion
#  Disp = 0.03873 , BCV = 0.1968 
All good!

What do you think????
thanks a million!!!!
great software!
cheers,
maximo

--------------------------------------------------------
Maximo Rivarola PhD.

________________________________________
From: Ryan [rct at thompsonclan.org]
Sent: Tuesday, June 17, 2014 1:39 PM
To: Maximo Rivarola
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Contrast among a subset of interactions, glm in edgeR

Hi Maximo,

I believe that you want to use the following:

designformula <- terms(~ 0 + cepa:tiempo:inf +  linea, keep.order =
TRUE)
design7 <- model.matrix(designformula)

This will give you one coefficient for each combination of cepa,
timepo, and inf. Also, after doing that, you might also want to replace
the colons in the column names with periods, to make syntacically valid
names, which will make things easier when you start constructing
contrasts:

colnames(design7) <- make.names(colnames(design7))

-Ryan

On Tue Jun 17 09:28:17 2014, Maximo Rivarola wrote:
> Hello everyone,
>
> I am writing my first email to this list. I have a question in regards to contrasting a few interactions...
> I have an RNA-seq experiment with the following structure:
> - 3 cepa (genotypes)
> - 3 tiempo (time points)
> - inf (Inoculated or not)
>
> Basically, three different genotypes were infected or not (inf) and then 3 time points were take and sequenced. Most samples have 2 biol replicates and some 4 biol replicates.
>
> My question is the following:
> I would like to make a contrast were I compare the interaction of cepaHA853:tiempo4:infinoc - cepaHA853:tiempo8:infinoc  ????
> My problem is I do not know where the column for this interaction is???
> I placed the 0 as for it not to intercept in the model, but the later factors are always shown from level 2 on...
> How can I "see" all the interaction from my design???
>
> Below is some code:
> Thanks in advanced!!! cheers,
>
> Design implemented:
> design7 <- model.matrix(~ 0 + cepa*tiempo*inf +  linea)
>
>> dim(design7)
> [1] 56 24
>
>> head(design7)
>    cepaHA853 cepaHA89 cepaRK416 tiempo4 tiempo8 infinoc linea2 linea3 linea4 linea5 linea6 linea7 cepaHA89:tiempo4 cepaRK416:tiempo4
> 1         0        1         0       0       0       1      0      1      0      0      0      0                0                 0
> 2         0        1         0       0       0       1      0      0      1      0      0      0                0                 0
> 3         0        1         0       0       0       1      0      0      0      1      0      0                0                 0
> 4         0        1         0       0       0       1      0      0      0      0      1      0                0                 0
> 5         1        0         0       0       0       1      0      0      0      0      0      1                0                 0
> 6         1        0         0       0       0       1      0      0      0      0      0      0                0                 0
>    cepaHA89:tiempo8 cepaRK416:tiempo8 cepaHA89:infinoc cepaRK416:infinoc tiempo4:infinoc tiempo8:infinoc cepaHA89:tiempo4:infinoc
> 1                0                 0                1                 0               0               0                        0
> 2                0                 0                1                 0               0               0                        0
> 3                0                 0                1                 0               0               0                        0
> 4                0                 0                1                 0               0               0                        0
> 5                0                 0                0                 0               0               0                        0
> 6                0                 0                0                 0               0               0                        0
>    cepaRK416:tiempo4:infinoc cepaHA89:tiempo8:infinoc cepaRK416:tiempo8:infinoc
> 1                         0                        0                         0
> 2                         0                        0                         0
> 3                         0                        0                         0
> 4                         0                        0                         0
> 5                         0                        0                         0
> 6                         0                        0                         0
>>
>
> dge7 <- estimateGLMCommonDisp(dge1, design7, verbose = TRUE)  #To estimate common dispersion
> #  Disp = 0.03873 , BCV = 0.1968
>
> dge7 <- estimateGLMTagwiseDisp(dge7, design7) #To estimate tagwise dispersions
> # $tagwise.dispersion
> # [1] 0.21290421 0.01218267 0.01580605 0.00906055 0.02124947
>
> fit7 <- glmFit(dge7, design7, dispersion=dge7$tagwise.dispersion)
>
>> class(fit7)
> [1] "DGEGLM"
> attr(,"package")
> [1] "edgeR"
>
>> sessionInfo()
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=es_ES.UTF-8
>   [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=es_ES.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] edgeR_3.6.2  limma_3.20.4
>
> thanks a lot!
> hope is explained well....
>
> --------------------------------------------------------
> Maximo Rivarola
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


More information about the Bioconductor mailing list