[BioC] edgeR estimateGLMCommonDisp Error on RNA analysis

Gordon K Smyth smyth at wehi.EDU.AU
Mon Mar 24 05:00:12 CET 2014


Dear Justin,

You are creating design matrices that have more columns than you have 
samples. So it's not surprising that edgeR tells you there are no residual 
df available to estimate the dispersion.

You want to test the treatment effect for each individual diseased patient 
vs the average of the treatment effect for healthy patients.  Making 
design matrices for hypotheses like this isn't automatic in R.  Here is 
one way to do it:

  > samples
      Disease Patient Treatment
  1   Healthy      H1       npc
  2   Healthy      H1        ne
  3   Healthy      H2       npc
  4   Healthy      H2        ne
  5   Healthy      H3       npc
  6   Healthy      H3        ne
  7  Disease1      D4       npc
  8  Disease1      D4        ne
  9  Disease2      D5       npc
  10 Disease2      D5        ne
  11 Disease3      D6       npc
  12 Disease3      D6        ne
  > design1 <- model.matrix(~Patient,data=samples)
  > design2 <- model.matrix(~Disease*Treatment,data=samples)
  > design <- cbind(design1,design2[,5:8])
  > colnames(design)
   [1] "(Intercept)"                  "PatientD5"
   [3] "PatientD6"                    "PatientH1"
   [5] "PatientH2"                    "PatientH3"
   [7] "Treatmentnpc"                 "DiseaseDisease1:Treatmentnpc"
   [9] "DiseaseDisease2:Treatmentnpc" "DiseaseDisease3:Treatmentnpc"

Then you can test for coefficients 8, 9 and 10.

Best wishes
Gordon


> Date: Fri, 21 Mar 2014 17:09:37 +0800
> From: "Justin Jeyakani (GIS)" <gnanakkan at gis.a-star.edu.sg>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] edgeR estimateGLMCommonDisp Error on RNA analysis
>
> Hello Sir,
>
> I'm doing differential gene exp. by the edgeR package. My data seems to 
> suitable to analyse by glm method (As per edgeR userguide section 3.5 
> Comparisons Both Between and Within Subjects).
>
> I have two different matrix to design.
> Matrix One:

> I have 12 dataset 3 samples from different healthy individuals (Healthy) 
> of two different stages (npc,ne) and 3 samples from different 
> individuals (Disease1) of two different stages (npc,ne), so 6 data from 
> each healthy(6) and Disese1 (6) below is my code and works well. But for 
> my another design matrix it's not!
>
> Matrix Two:

> But I want to comapre the All the Healthy (6data from 3 indiauals) with 
> each of the Disease1 individual (npc,ne). B'ze I'm getting 125 diff.exp 
> genes btn healthyVsDisease1 and I'm expecting more, If I do individual 
> test of 3 disease1 with Healthy but the I'm getting error in the 
> "estimateGLMCommonDisp" steps. I couldn't solve the problem. I have the 
> same problem with my another matrix also. Can guide me to solve this 
> issue highly appreciated ... below is my code for the above mentiond two 
> matrix...do I need to define anthing in the generate factor Level "gl" 
> further? Since Healthy has 3 sets and Disease 1, 2, 3 has one set 
> causing dispersion error.
>
> Matrix1.txt: my input (works fine)
>    Disease Patient Treatment
> 1   Healthy       1       npc
> 2   Healthy       1        ne
> 3   Healthy       2       npc
> 4   Healthy       2        ne
> 5   Healthy       3       npc
> 6   Healthy       3        ne
> 7  Disease1       4       npc
> 8  Disease1       4        ne
> 9  Disease1       5       npc
> 10 Disease1       5        ne
> 11 Disease1       6       npc
> 12 Disease1       6        ne
>
> SUPT1<-read.table("matrix1.txt",header=TRUE)
> summary(SUPT1)
> Patient<-gl(3,2,length=12)
> Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1"))
> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne"))
>
> data.frame(Disease,Patient,Treatment)
>    Disease Patient Treatment
> 1   Healthy       1       npc
> 2   Healthy       1        ne
> 3   Healthy       2       npc
> 4   Healthy       2        ne
> 5   Healthy       3       npc
> 6   Healthy       3        ne
> 7  Disease1       1       npc
> 8  Disease1       1        ne
> 9  Disease1       2       npc
> 10 Disease1       2        ne
> 11 Disease1       3       npc
> 12 Disease1       3        ne
>
> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment)
> colnames(design)
> [1] "(Intercept)"                 "DiseaseDisease1"
> [3] "DiseaseHealthy:Patient2"     "DiseaseDisease1:Patient2"
> [5] "DiseaseHealthy:Patient3"     "DiseaseDisease1:Patient3"
> [7] "DiseaseHealthy:Treatmentne"  "DiseaseDisease1:Treatmentne"
>
> library(edgeR)
> count<-read.table("RHN035-46_s1.txt",header=TRUE)
> head (count)
> colnames(count)
> samplename=colnames(count)
> cds01<-DGEList(count,group=samplename)
> head(cds01)
> cds01
> summary(cds01)
> dim(cds01)
>
>
> keep<-rowSums(cpm(cds01)>2)>=4
> cds01<-cds01[keep,]
> dim(cds01)
>
> cds01$sample$lib.size<-colSums(cds01$counts)
> y <- estimateGLMCommonDisp(cds01,design)
> y <- estimateGLMTrendedDisp(y,design)
> y <- estimateGLMTagwiseDisp(y,design)
> fit <- glmFit(y, design)
>
> lrt <- glmLRT(fit, coef="DiseaseHealthy:Treatmentne")
> topTags(lrt)
> detags <- rownames(topTags(lrt)$table)
> cpm(y)[detags, order(y$samples$group)]
> summary(de <- decideTestsDGE(lrt))
>
> lrt <- glmLRT(fit, coef="DiseaseDisease1:Treatmentne")
> topTags(lrt)
> detags <- rownames(topTags(lrt)$table)
> cpm(y)[detags, order(y$samples$group)]
> summary(de <- decideTestsDGE(lrt))
>
> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,-1,1))
> topTags(lrt)
> detags <- rownames(topTags(lrt)$table)
> cpm(y)[detags, order(y$samples$group)]
> summary(de <- decideTestsDGE(lrt))
> Matrix2.txt:my input (gives Error) ( I have split the Disease1 into 3 Disease sets and need to compare Healthy Vs Disease1/Disease2/Disese3 and genes respond to "ne" in Disease1/2/3)
>    Disease Patient Treatment
> 1   Healthy       1       npc
> 2   Healthy       1        ne
> 3   Healthy       2       npc
> 4   Healthy       2        ne
> 5   Healthy       3       npc
> 6   Healthy       3        ne
> 7  Disease1       4       npc
> 8  Disease1       4        ne
> 9  Disease2       5       npc
> 10 Disease2       5        ne
> 11 Disease3       6       npc
> 12 Disease3       6        ne
>
> SUPT1<-read.table("matrix2.txt",header=TRUE)
> summary(SUPT1)
> Patient<-gl(3,2,length=12)
> Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1","Disease2","Disease3"))
> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne"))
>
> data.frame(Disease,Patient,Treatment)
>    Disease Patient Treatment
> 1   Healthy       1       npc
> 2   Healthy       1        ne
> 3   Healthy       2       npc
> 4   Healthy       2        ne
> 5   Healthy       3       npc
> 6   Healthy       3        ne
> 7  Disease1       1       npc
> 8  Disease1       1        ne
> 9  Disease2       2       npc
> 10 Disease2       2        ne
> 11 Disease3       3       npc
> 12 Disease3       3        ne
>
> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment)
> colnames(design)
> [1] "(Intercept)"                 "DiseaseDisease1"
> [3] "DiseaseDisease2"             "DiseaseDisease3"
> [5] "DiseaseHealthy:Patient2"     "DiseaseDisease1:Patient2"
> [7] "DiseaseDisease2:Patient2"    "DiseaseDisease3:Patient2"
> [9] "DiseaseHealthy:Patient3"     "DiseaseDisease1:Patient3"
> [11] "DiseaseDisease2:Patient3"    "DiseaseDisease3:Patient3"
> [13] "DiseaseHealthy:Treatmentne"  "DiseaseDisease1:Treatmentne"
> [15] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne"
>
> library(edgeR)
> count<-read.table("RHN035-46_s1.txt",header=TRUE)
> head (count)
> colnames(count)
> samplename=colnames(count)
> cds01<-DGEList(count,group=samplename)
> head(cds01)
> cds01
> summary(cds01)
> dim(cds01)
>
>
> keep<-rowSums(cpm(cds01)>2)>=4
> cds01<-cds01[keep,]
> dim(cds01)
>
> cds01$sample$lib.size<-colSums(cds01$counts)
>
> y <- estimateGLMCommonDisp(cds01,design)
> Warning message:
> In estimateGLMCommonDisp.default(y = y$counts, design = design,  :
>  No residual df: setting dispersion to N
> Matri3.txt- my input (gives Error)
>    Disease Patient Treatment
> 1   Healthy       1       npc
> 2   Healthy       1        ne
> 3   Healthy       2       npc
> 4   Healthy       2        ne
> 5   Healthy       3       npc
> 6   Healthy       3        ne
> 7   Healthy       4       npc
> 8   Healthy       4        ne
> 9  Disease1       5       npc
> 10 Disease1       5        ne
> 11 Disease2       6       npc
> 12 Disease2       6        ne
> 13 Disease2       7       npc
> 14 Disease2       7        ne
> 15 Disease3       8       npc
> 16 Disease3       8        ne
>
> SUPT1<-read.table("matrix2.txt",header=TRUE)
> summary(SUPT1)
> Patient<-gl(4,2,length=16)
> Disease <- factor(SUPT1$Disease, levels=c("Healthy","Disease1","Disease2","Disease3"))
> Treatment <- factor(SUPT1$Treatment, levels=c("npc","ne"))
>
> data.frame(Disease,Patient,Treatment)
>    Disease Patient Treatment
> 1   Healthy       1       npc
> 2   Healthy       1        ne
> 3   Healthy       2       npc
> 4   Healthy       2        ne
> 5   Healthy       3       npc
> 6   Healthy       3        ne
> 7   Healthy       4       npc
> 8   Healthy       4        ne
> 9  Disease1       1       npc
> 10 Disease1       1        ne
> 11 Disease2       2       npc
> 12 Disease2       2        ne
> 13 Disease2       3       npc
> 14 Disease2       3        ne
> 15 Disease3       4       npc
> 16 Disease3       4        ne
>
> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment)
> colnames(design)
> [1] "(Intercept)"                 "DiseaseDisease1"
> [3] "DiseaseDisease2"             "DiseaseDisease3"
> [5] "DiseaseHealthy:Patient2"     "DiseaseDisease1:Patient2"
> [7] "DiseaseDisease2:Patient2"    "DiseaseDisease3:Patient2"
> [9] "DiseaseHealthy:Patient3"     "DiseaseDisease1:Patient3"
> [11] "DiseaseDisease2:Patient3"    "DiseaseDisease3:Patient3"
> [13] "DiseaseHealthy:Patient4"     "DiseaseDisease1:Patient4"
> [15] "DiseaseDisease2:Patient4"    "DiseaseDisease3:Patient4"
> [17] "DiseaseHealthy:Treatmentne"  "DiseaseDisease1:Treatmentne"
> [19] "DiseaseDisease2:Treatmentne" "DiseaseDisease3:Treatmentne"
>
> library(edgeR)
> count<-read.table("RHN035-46-47-53_51-57_s1-s2gtf_tophatuniq.count.txt",header=TRUE)
> head (count)
> colnames(count)
> samplename=colnames(count)
> cds01<-DGEList(count,group=samplename)
> head(cds01)
> cds01
> summary(cds01)
> dim(cds01)
>
> keep<-rowSums(cpm(cds01)>1)>=4
> cds01<-cds01[keep,]
> dim(cds01)
>
> cds01$sample$lib.size<-colSums(cds01$counts)
> y <- estimateGLMCommonDisp(cds01,design)
> Error in return(NA, ntags) : multi-argument returns are not permitted
> In addition: Warning message:
> In estimateGLMTrendedDisp.default(y = y$counts, design = design,  :
>  No residual df: cannot estimate dispersion
>
> Thanks and regards,
> Justin

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list