[BioC] EdgeR classic vs. glm

Thu Mar 13 23:59:56 CET 2014

With regard to the exception that you get, it means what it says. You 
are testing 3 coefficients, so you have 3 logFC values, and it wouldn't 
make sense to just pick one at random and plot it. Try running glmLRT 
again with only one coef at a time, and you will be able to do 
plotSmear with the result.

On Thu 13 Mar 2014 03:09:47 PM PDT, Son Pham wrote:
> Thank you Yunshun!
> But there are some confusion remains:
> I did a similar comparison: finding DEGs in disease2 vs. control: while
> classic EdgeR gives some DEGs, GLM approach gives 0 DEG, which is strange.
> I believe that when using GLM approach, we also consider the patients
> information and thus, the DEGs list should be larger than disregarding such
> information (edgeR classic).
>
>
> Also, when I want to find the response to hormone in any disease group, it
> yields exceptions:
>
> Also, the
> #Genes that response to hormone in any disease group
> lrt <- glmLRT(fit, coef=10:12)
> topTags(lrt)
> summary(de <- decideTestsDGE(lrt))
> pdf("hormoneEffectInAnyThing.pdf")
> detags <- rownames(y)[as.logical(de)]
> plotSmear(lrt, de.tags=detags)
> abline(h=c(-1, 1), col="blue")
> dev.off()
>
>
>   It yields an exception:
>
>> plotSmear(lrt, de.tags=detags)
> Error in plotSmear(lrt, de.tags = detags) :
>    table$logFC slot in DGELRT object is NULL. We cannot produce an MA
> (smear) plot if more than one coefficient from the GLM is being tested in
> the likelihood ratio test as this results in more one logFC value per
> gene---one for each coefficient.
>> abline(h=c(-1, 1), col="blue")
>
> Any suspect? The design is absolutely the same as the one in the example.
> Son.
>
>
> On Tue, Mar 11, 2014 at 6:54 PM, Yunshun Chen <yuchen at wehi.edu.au> wrote:
>
>> Hi Son,
>>
>>
>>
>> First of all, under the classic edgeR you estimated the dispersions only
>> from a subset of the data.
>>
>> The dispersion estimates were not as reliable as those estimated under the
>> GLM approach (since the information of the other 12 samples were totally
>> ignored).
>>
>>
>>
>> Secondly, your classic edgeR analysis did not account for the patient
>> effect at all.
>>
>> These 6 samples are actually paired (by patient) and cannot be treated as
>> independent samples.
>>
>> This is the main reason why you didn't get any DE genes under the classic
>> edgeR.
>>
>>
>>
>> As stated in the users' guide, the classic edgeR is only applicable on
>> datasets with a single factor design.
>>
>> Hence, you have to use the GLM approach for your data.
>>
>>
>>
>> Cheers,
>>
>> Yunshun
>>
>>
>>
>>
>>
>>
>>
>>
>> --------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>
>>
>>
>> Hi All,
>>
>> I have an experiment that is exactly the same as the one described in
>>
>> section 3.5 (Comparisons Both Between and Within Subjects)  in the
>>
>> manual<
>> http://www.bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
>>>
>>
>> :
>>
>> The experiment has 18 RNA samples collected from 9 subjects. The samples
>>
>> correspond to cells from 3 healthy patients, either treated or not with a
>>
>> hormone; cells from 3 patients with disease 1, either treated or not with
>>
>> the hormone; and cells from 3 patients with disease 2, either treated or
>>
>> not with the hormone.
>>
>>
>>
>> I now find  genes responding to the hormone in disease1 patients using two
>>
>> approachs glm and edgeR classic. The results are dramatically different !!!
>>
>> glm gives hundreds of differentially expressed genes, while edgeR classic
>>
>> gives 0.
>>
>> Could anyone tell me that is what one can expect or I'm totally wrong !
>>
>>
>>
>>
>>
>> data.frame(Disease,Patient,Treatment)
>>
>> design <- model.matrix(~Disease+Disease:Patient+Disease:Treatment)
>>
>> colnames(design)
>>
>> y <- estimateGLMCommonDisp(y,design)
>>
>> y <- estimateGLMTrendedDisp(y,design)
>>
>> y <- estimateGLMTagwiseDisp(y,design)
>>
>> fit <- glmFit(y, design)
>>
>>
>>
>> lrt <- glmLRT(fit, coef="DiseaseDisease1:TreatmentHormone")
>>
>> topTags(lrt)
>>
>> detags <- rownames(topTags(lrt)$table)
>>
>> cpm(y)[detags, order(y$samples$group)]
>>
>> summary(de <- decideTestsDGE(lrt))
>>
>> This gives  612 (-1) and 1129 (+1) genes.
>>
>>
>>
>> I now try to just use 3 patients with Disease1, treated and non treated
>>
>> with hormone as 2 groups and apply the classic edgeR to find the effect of
>>
>> hormon to disease1:
>>
>>
>>
>> targets <-readTargets()
>>
>> x <-read.delim("new.counts", row.names="id")
>>
>> x_c_n <- x[c("d1_1_hormone", "d1_2_hormone", "d1_3_hormone", "d_1", "d_2",
>>
>> "d_3")]
>>
>> g <- factor(c(1,1,1,2,2,2))
>>
>> y <- DGEList(counts=x_c_n, group=g)
>>
>> y <- calcNormFactors(y)
>>
>> y <- estimateCommonDisp(y)
>>
>> y <- estimateTagwiseDisp(y)
>>
>> et <- exactTest(y)
>>
>> topTags(et)
>>
>> summary(de <- decideTestsDGE(et))
>>
>> This gives 0 (-1) and also 0 (+1) gene !!!!
>>
>>
>>
>> How can there be such a significant difference between these two
>>
>> approaches? Or I'm wrong?
>>
>>
>>
>> Thank you,
>>
>>
>>
>> Son.
>>
>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and inte...{{dropped:10}}
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor