[BioC] Why the resuts are so different between the classic and the glm methods?

Sat Dec 14 08:55:43 CET 2013

Dear Jiantao,

On Fri, 13 Dec 2013, Jiantao Yu wrote:

> Dear Dr. Smyth,
>
> I don't know if I should reply you with my script directly, or I should
> post the script on bioconductor at stat.math.ethz.ch. I will do this next time
> if you could tell me what I should do. Now I attached my script in this
> email, and hope you could help me to solve the problem. Thanks !

Always post to the list -- I've cc'd this reply to the list.

The problem with your glm code is that you are testing the second group 
mean equal to zero instead of testing for a difference between the two 
groups.  The edgeR User's Guide Sections 3.2.3 and 3.2.4 show two 
different (correct) ways to do the test.  You have mixed up some code from 
Section 3.2.3 with some code from Section 3.2.4.

> BTW: There is a sentence in the manual, 'As we discussed in the previous 
> section, the exact test is only applicable to experiments with a single 
> factor.' What does 'single factor' mean here? For example, my file as 
> input to the edgeR has 7 columns, the first column is the gene locus, 
> No.2-4 columns are wildtype biological replicates, and No.5-7 columns 
> are mutant biological replicates. Do this mean my data is 'multiple 
> factor', not 'single factor', so I should use GLM rather than 'classic 
> method'? Thank you!

This is one factor with two levels.  Have a look at the pdf manual called 
"An Introduction to R" that comes with R.  Section 4 gives a quick 
introduction to factors.  This might get you started.

Best wishes
Gordon

>
> Regards
> Jiantao
>
>
>
> 2013/12/12 Gordon K Smyth <smyth at wehi.edu.au>
>
>> Hi Joel,
>>
>> The mailing list will not distribute large attachments, so we can't see
>> your code script.
>>
>> Most likely you've made an error in the script, because the classic and
>> glm pipelines in edgeR should give very similar results for a two group
>> comparison. We would need to see the code to tell what the error is. We
>> shouldn't need to see your data, just the code.
>>
>> Best wishes
>> Gordon
>>
>> --------- original message -----------
>> Jiantao Yu joelyu.2003 at gmail.com
>> Thu Dec 12 22:54:05 CET 2013
>>
>> Dear Sir/Madam,
>>
>> When I used edgeR to do RNA-Seq analysis, I found the datasets of the
>> resulting DEGs generated by 'Classic' and 'GLM' methods are very different,
>> the former contain ~460 DEGs, whereas the latter generate ~27,000 DEGs. I
>> don't know why this was happening. I attached the scripts and the source
>> data I used, hope you would help me to explain this.
>>
>> Joel
>> -------------- next part --------------
>> "mu1.bam" "mu2.bam" "mu3.bam" "wt1.bam" "wt2.bam" "wt3.bam"
>> "AT1G01010" 50 89 54 69 71 56
>> "AT1G01020" 218 261 198 309 248 241
>> "AT1G01030" 27 47 26 50 48 23
>> "AT1G01040" 582 676 466 202 229 830
>> "AT1G01046" 10 11 8 6 6 5
>> ...
>>

______________________________________________________________________
The information in this email is confidential and intended solely for the addressee.
You must not disclose, forward, print or use it without the permission of the sender.
______________________________________________________________________