[BioC] edgeR

Gordon K Smyth smyth at wehi.EDU.AU
Mon Jun 13 01:02:02 CEST 2011


Dear Sumathy,

Thanks for giving your code.  I can now see what you are referring to.

You are quite correct that after your code sequence

   summary(decideTestsDGE(de.tgw,p.value=0.01))
   [,1]
   -1    96
   0  12343
   1    200
   detop.tgw <- topTags (de.tgw, n=296)

you should see exactly 96 negative and exactly 200 positive 
log-fold-changes in the data.frame detop.tgw.  Try:

   table(detop.tgw$table$logFC > 0)

to see how many there are up and down.

In all my testing of decideTestsDGE() and topTags(), this is exactly how 
they behave.  If this isn't what you see, is it possible that you have 
simply made a mistake transferring to csv or with the counting?

edgeR uses statistical significance to decide on the number of 
differentially expressed genes, rather than arbitrary fold change cutoff. 
The approach is explained in the documentation and the cited papers.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
Tel: (03) 9345 2326, Fax (03) 9347 0852,
smyth at wehi.edu.au
http://www.wehi.edu.au
http://www.statsci.org/smyth

On Sun, 12 Jun 2011, puvan001 at umn.edu wrote:

>
>
> I am sorry. I'll describe what I have done.
> I am looking for differentially expressed genes in two groups and each group 
> has three replicates.
> First I used common dispersion (counts.csv is my data file and totalreads.csv 
> is about my library size).
> counts <-read.csv("counts.csv",row.names=1)
> libsize <- read.csv("totalreads.csv", row.names=1)
> d <-DGEList (counts, group=c(2,2,2,1,1,1), lib.size=libsize$total
> d <- DGEList(counts, group=c(2,2,2,1,1,1), lib.size=libsize$Total)
> I removed reads < than 5 ( according to the manual )
> d <- d[rowSums(d$counts) >= 5, ]
> d <- estimateCommonDisp(d)
>> d$common.dispersion
> [1] 0.008390605
> de.com <- exactTest(d)
> summary(decideTestsDGE(de.com, p.value=0.01))
> -1    92
> 0  12357
> 1    190
>
> I decided to use moderated tagwise dispersion and prior. n 10 (according to 
> the manual I thought this would be better )
> summary(d$tagwise.dispersion)
>   Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 0.006051 0.006051 
> 0.011150 0.008987 0.011150 0.037460 
> summary(decideTestsDGE(de.tgw,p.value=0.01))
>  [,1] -1    96
> 0  12343
> 1    200
>
> (What my understanding is 96 genes are down regulated and 200 genes are up 
> regulated and I wanted to get the all these differentially expressed genes)
>
> detop.tgw <-topTags (de.tgw, n=296)
> topTags(de.tgw)
>
> my results-only a part
>
>              ID    logConc    logFC      P.Value     	 FDR
>> ENSSSCT00000011438 -15.33546 4.312736 1.806225e-97 2.282888e-93
>> ENSSSCT00000004165 -15.34398 2.862672 1.236263e-88 7.812563e-85
>> ENSSSCT00000013211 -14.81470 3.003443 1.145003e-77 4.823900e-74
>> ENSSSCT00000011437 -16.88133 4.625023 4.776290e-73 1.509188e-69
>> ENSSSCT00000010657 -18.66400 6.789652 1.482137e-71 3.746546e-68
>> ENSSSCT00000013209 -18.85435 7.246780 1.757241e-67 3.701628e-64
>> ENSSSCT00000019271 -15.67690 3.069440 7.803066e-66 1.408899e-62
>> ENSSSCT00000017966 -15.86827 3.089246 7.932011e-63 1.253159e-59
>> ENSSSCT00000013215 -16.27135 3.161484 2.973959e-58 4.176430e-55
>> ENSSSCT00000011439 -16.20229 3.461734 2.624576e-55 3.317202e-52
>> ENSSSCT00000010656 -34.07557 31.8809589 1.646600e-16 4.729859e-14
>> ENSSSCT00000010084 -15.37232  1.3166928 3.997218e-16 1.122685e-13
>> ENSSSCT00000003998 -15.32474 -1.3062885 4.289687e-16 1.178638e-13
>> ENSSSCT00000018978 -14.38557  0.9368015 1.582580e-15 4.255793e-13
>> ENSSSCT00000010832 -21.11786  5.8784200 7.617618e-15 2.005814e-12
>> ENSSSCT00000012919 -19.87289  3.7289682 2.159284e-14 5.569629e-12
>> ENSSSCT00000002141 -15.34515  1.2304344 2.586146e-14 6.537259e-12
>> ENSSSCT00000015009 -16.73944  1.3602630 1.537101e-13 3.809299e-11
>> ENSSSCT00000007972 -13.99426 -0.8061797 4.105738e-13 9.934890e-11
>> ENSSSCT00000009832 -20.70634  4.7030671 4.166067e-13 9.934890e-11
>> ENSSSCT00000017656 -15.16996  1.1164872 8.232265e-13 1.926807e-10
>> ENSSSCT00000003967 -12.27699 -0.9214116 1.414101e-12 3.249604e-10
>> ENSSSCT00000017482 -13.37026  0.7404062 2.139773e-12 4.829390e-10
>> ENSSSCT00000019546 -13.99564  0.9811760 2.192554e-12 4.861701e-10
>
> My confusion starts here. I thought the fold change for down regulated genes 
> are lower than 0. But when I counted after changing into csv. format I got 
> more than 96 genes (According to the summary, -1 is 96). Am I doing something 
> wrong?
> Generally, more than 2 fold difference is considered as differential 
> expression. I am little bit confused about the value used by edgeR for 
> differential expression. These questions may be too simple but my background 
> is not statistics and I don't have a very good understanding. I'd really 
> appreciate your comments.
>
>
> Thanks
> Sumathy

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list