[BioC] edgeR - R script - results compared to DESeq

Smith, Hilary A hilary.smith at gatech.edu
Wed Nov 30 14:30:36 CET 2011


In case it helps the discussion ... I also tried running GLMs in both DESeq and edgeR. I likewise found that edgeR yielded more differentially expressed tags or genes. I know Dr. Gordon Smyth mentioned calcNormFactors and tagwise dispersion; I did use both of these options. 

If it helps, an abstract description of the model comparison used in both programs is below (and below that if helpful, full code for edgeR -- I am using the newest release). I assume the differences are coming from the ways DESeq and edgeR estimate dispersion, but I'm eager to learn more about the rationale (especially given I just started using R/Bioconductor a few weeks ago) and note the results below in case they are of use in identifying the differences. I am glad to hear this is not just a factor of my dataset and is a common feature to have edgeR find more genes.  

My models for main effects A and B (with 3 biological reps. each) and their interaction are: 
Full: (A + B + A:B)
Reduced1: (A + B)
Reduced 2: (A)

The comparison of the Full vs. Reduced1 yields the genes impacted by the interaction term A:B. To obtain genes impacted by the main effect A, I perform a comparison of Full vs. Reduced1 and a comparison of Full vs. Reduced 2 -- those genes found at P.adj<0.05 in the Full vs. Reduced 2 comparison but not found in Full vs. Reduced 1 are what I am noting as genes impacted by main effect A. (To the best of my knowledge I cannot simply drop term A and leave in the interaction term, so this is my attempt to isolate term A. If there's a better way to do this, I'd be glad to know.).

My results for the interaction term are:
edgeR: 173 genes
DESeq: 38 genes


For the main effect A:
edgeR: 261 genes
DESeq: 61 genes
**NOTE: For this comparison of term A, of the 61 genes found by DESeq, about 44 (or ~72%) were also found by edgeR.

I did have warnings in running DESeq that the Full model GLM didn't converge which is disconcerting... edgeR didn't give these warnings but still found more components.

Best,
Hilary


~~In the code below, main effect "A" above is "Season," and main effect "B" above is "Hydroperiod." 

> library(edgeR)
> library(limma)
> raw.data = read.csv("2011.11.14counts.csv", header=TRUE, stringsAsFactors=FALSE)
> d = raw.data[,2:13]
> rownames(d) = raw.data[,1]
> head(d)
          X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F
comp0      1159  1572  1817   605  1113  1732  1065  1207  1477  1841  1915
comp1       534   675   739   236   451   799   544   341   333   690   502
comp10    37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300

comp100    1065  1332  1620   658   861  1370  1060   999   918  1697  1117
comp1000    157   266   247   135   188   244   130   229   141   263   182
comp10000    14    37    47    17    21    64    35    15    10    28    22
          X3P_F
comp0      1645
comp1       571
comp10    44575
comp100    1336
comp1000    168
comp10000    12

> Hydroperiod = factor(c("E", "E", "E", "P", "P", "P", "E", "E", "E", "P", "P", "P"))
> Season = factor(c("R", "R", "R", "R", "R", "R", "F", "F", "F", "F", "F", "F"))
> design = model.matrix(~Hydroperiod + Season + Hydroperiod:Season)
> design
   (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
1            1            0       1                    0
2            1            0       1                    0
3            1            0       1                    0
4            1            1       1                    1
5            1            1       1                    1
6            1            1       1                    1
7            1            0       0                    0
8            1            0       0                    0
9            1            0       0                    0
10           1            1       0                    0
11           1            1       0                    0
12           1            1       0                    0
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Hydroperiod
[1] "contr.treatment"

attr(,"contrasts")$Season
[1] "contr.treatment"

> d.GLM = DGEList(d, group = c("ER", "ER", "ER", "PR", "PR", "PR", "EF", "EF", "EF", "PF", "PF", "PF"))
Calculating library sizes from column totals.
> d.GLM = calcNormFactors(d.GLM)
> d.GLM
An object of class "DGEList"
$samples
      group lib.size norm.factors
X1E_R    ER 23295633    0.9559226
X2E_R    ER 25882545    1.1040337
X3E_R    ER 29401480    1.0236513
X1P_R    PR 20877015    0.8199915
X2P_R    PR 26649613    0.8869479
7 more rows ...

$counts
         X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F
comp0     1159  1572  1817   605  1113  1732  1065  1207  1477  1841  1915
comp1      534   675   739   236   451   799   544   341   333   690   502
comp10   37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300
comp100   1065  1332  1620   658   861  1370  1060   999   918  1697  1117
comp1000   157   266   247   135   188   244   130   229   141   263   182
         X3P_F
comp0     1645
comp1      571
comp10   44575

comp100   1336
comp1000   168
25055 more rows ...

$all.zeros
   comp0    comp1   comp10  comp100 comp1000 
   FALSE    FALSE    FALSE    FALSE    FALSE 
25055 more elements ...

> nrow(d.GLM)
[1] 25060
> dim(d.GLM)
[1] 25060    12


> design
   (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
1            1            0       1                    0
2            1            0       1                    0
3            1            0       1                    0
4            1            1       1                    1
5            1            1       1                    1
6            1            1       1                    1
7            1            0       0                    0
8            1            0       0                    0
9            1            0       0                    0
10           1            1       0                    0
11           1            1       0                    0
12           1            1       0                    0
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Hydroperiod
[1] "contr.treatment"

attr(,"contrasts")$Season
[1] "contr.treatment"

> d.GLM = estimateGLMCommonDisp(d.GLM, design)
> names(d.GLM)
[1] "samples"           "counts"            "all.zeros"        
[4] "common.dispersion"
> d.GLM$common.dispersion 
[1] 0.1488192
> sqrt(d.GLM$common.dispersion)
[1] 0.3857709
> # 0.3857709 is the Coefficient of Biological Variation
> d.GLM = estimateGLMTrendedDisp(d.GLM, design)
Loading required package: splines
> summary(d.GLM$trended.dispersion)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.07541 0.10240 0.19030 0.25500 0.37530 1.26900 
> d.GLM = estimateGLMTagwiseDisp(d.GLM, design)
> d.GLM$prior.n
NULL
> d$prior.n
NULL
> ls()
[1] "Hydroperiod" "Season"      "d"           "d.GLM"       "design"     
[6] "raw.data"   
> d.GLM
An object of class "DGEList"
$samples
      group lib.size norm.factors
X1E_R    ER 23295633    0.9559226
X2E_R    ER 25882545    1.1040337
X3E_R    ER 29401480    1.0236513

X1P_R    PR 20877015    0.8199915
X2P_R    PR 26649613    0.8869479
7 more rows ...

$counts
         X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F
comp0     1159  1572  1817   605  1113  1732  1065  1207  1477  1841  1915
comp1      534   675   739   236   451   799   544   341   333   690   502
comp10   37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300
comp100   1065  1332  1620   658   861  1370  1060   999   918  1697  1117
comp1000   157   266   247   135   188   244   130   229   141   263   182
         X3P_F
comp0     1645
comp1      571
comp10   44575
comp100   1336
comp1000   168
25055 more rows ...

$all.zeros
   comp0    comp1   comp10  comp100 comp1000 
   FALSE    FALSE    FALSE    FALSE    FALSE 
25055 more elements ...

$common.dispersion
[1] 0.1488192

$trended.dispersion
[1] 0.07619372 0.08679353 0.10444478 0.07739597 0.10629516
25055 more elements ...

$abundance
     comp0      comp1     comp10    comp100   comp1000 
 -9.737514 -10.720757  -6.331670  -9.937984 -11.724981 
25055 more elements ...

$bin.dispersion
[1] 0.7340070 0.6589801 0.5901124 0.5500868 0.4865594
22 more elements ...

$bin.abundance
[1] -17.08166 -16.46619 -16.13033 -15.84028 -15.59074
22 more elements ...

$tagwise.dispersion
[1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404
25055 more elements ...

> ?getPriorN   
> getPriorN(d.GLM, design=design)
[1] 2.5
> head(d.GLM$tagwise.dispersion)
[1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508
> summary(d.GLM$tagwise.dispersion)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.05139 0.09363 0.18700 0.25910 0.35480 1.71800 
> glmfit.tgw = glmFit(d.GLM, design, dispersion=d.GLM$tagwise.dispersion)
> lrt.tgw = glmLRT(d.GLM, glmfit.tgw)
> topTags(lrt.tgw)
Coefficient:  HydroperiodP:SeasonR 
             logConc        logFC       LR      P.Value          FDR
comp13665 -10.932729 1.068620e+01 59.37370 1.304000e-14 3.267824e-10
comp15478 -12.077538 7.742379e+00 47.67249 5.037112e-12 5.552272e-08
comp370   -13.588446 1.442695e+08 46.86820 7.592458e-12 5.552272e-08


comp10848 -11.836655 8.970575e+00 46.56512 8.862366e-12 5.552272e-08
comp13403  -9.315242 3.773345e+00 44.92234 2.050057e-11 1.027488e-07
comp7180  -11.518762 7.869625e+00 43.18096 4.990399e-11 2.084323e-07
comp2502  -10.479763 5.723705e+00 41.56745 1.138735e-10 4.076673e-07
comp2740  -10.814075 4.666231e+00 38.00241 7.065735e-10 2.213342e-06
comp4314  -13.104853 4.837400e+00 35.44570 2.622602e-09 7.302491e-06
comp13675 -12.930253 4.442508e+00 34.46100 4.348772e-09 1.089802e-05
> summary(decideTestsDGE(lrt.tgw))
   [,1] 
-1    57
0  24887
1    116


> design  
   (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
1            1            0       1                    0
2            1            0       1                    0
3            1            0       1                    0
4            1            1       1                    1
5            1            1       1                    1
6            1            1       1                    1
7            1            0       0                    0
8            1            0       0                    0
9            1            0       0                    0
10           1            1       0                    0
11           1            1       0                    0
12           1            1       0                    0
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Hydroperiod
[1] "contr.treatment"

attr(,"contrasts")$Season
[1] "contr.treatment"

> lrt.coef4 = glmLRT(d.GLM, glmfit.tgw, coef=4)
> topTags(lrt.coef4)
Coefficient:  HydroperiodP:SeasonR 
             logConc        logFC       LR      P.Value          FDR
comp13665 -10.932729 1.068620e+01 59.37370 1.304000e-14 3.267824e-10
comp15478 -12.077538 7.742379e+00 47.67249 5.037112e-12 5.552272e-08
comp370   -13.588446 1.442695e+08 46.86820 7.592458e-12 5.552272e-08
comp10848 -11.836655 8.970575e+00 46.56512 8.862366e-12 5.552272e-08
comp13403  -9.315242 3.773345e+00 44.92234 2.050057e-11 1.027488e-07
comp7180  -11.518762 7.869625e+00 43.18096 4.990399e-11 2.084323e-07
comp2502  -10.479763 5.723705e+00 41.56745 1.138735e-10 4.076673e-07
comp2740  -10.814075 4.666231e+00 38.00241 7.065735e-10 2.213342e-06
comp4314  -13.104853 4.837400e+00 35.44570 2.622602e-09 7.302491e-06
comp13675 -12.930253 4.442508e+00 34.46100 4.348772e-09 1.089802e-05
> lrt.coef34 = glmLRT(d.GLM, glmfit.tgw, coef=3:4)
> topTags(lrt.coef34)
Coefficient:  SeasonR HydroperiodP:SeasonR 
            logConc       SeasonR HydroperiodP.SeasonR       LR      P.Value
comp11779 -13.60929  1.442695e+08        -1.442695e+08 88.20296 7.030231e-20
comp21414 -13.64545  5.616807e+00         1.442695e+08 71.78587 2.581642e-16
comp6411  -10.57883  1.671168e+00         2.006498e+00 67.75411 1.938124e-15
comp6417  -10.10518  1.510699e+00         1.224445e+00 65.09872 7.311274e-15
comp13665 -10.93273 -2.193560e+00         1.068620e+01 62.70305 2.422182e-14
comp1872  -12.53141  3.893662e+00        -2.007081e+00 62.49057 2.693670e-14
comp5005  -11.13142  4.565629e-01         3.063432e+00 61.87012 3.673456e-14
comp15150 -13.28156  5.032869e+00         1.442695e+08 58.96222 1.572234e-13
comp2502  -10.47976 -4.007870e-01         5.723705e+00 56.89575 4.418199e-13
comp19402 -11.60722  5.375493e+00        -5.180038e+00 52.82636 3.379876e-12
                   FDR
comp11779 1.761776e-15
comp21414 3.234797e-12
comp6411  1.618980e-11
comp6417  4.580513e-11
comp13665 1.125056e-10
comp1872  1.125056e-10
comp5005  1.315097e-10
comp15150 4.925022e-10
comp2502  1.230223e-09
comp19402 8.469969e-09
> ls()  
 [1] "Hydroperiod" "Season"      "d"           "d.GLM"       "design"     
 
   
 [6] "glmfit.tgw"  "lrt.coef34"  "lrt.coef4"   "lrt.tgw"     "oo"         
[11] "raw.data"   
> head(lrt.coef34)
An object of class "DGELRT"
$samples
      group lib.size norm.factors
X1E_R    ER 23295633    0.9559226
X2E_R    ER 25882545    1.1040337
X3E_R    ER 29401480    1.0236513
X1P_R    PR 20877015    0.8199915
X2P_R    PR 26649613    0.8869479
7 more rows ...

$all.zeros
    comp0     comp1    comp10   comp100  comp1000 comp10000 
    FALSE     FALSE     FALSE     FALSE     FALSE     FALSE 

$common.dispersion
[1] 0.1488192

$trended.dispersion
[1] 0.07619372 0.08679353 0.10444478 0.07739597 0.10629516 0.20365547

$abundance
     comp0      comp1     comp10    comp100   comp1000  comp10000 
 -9.737514 -10.720757  -6.331670  -9.937984 -11.724981 -13.712600 

$bin.dispersion
[1] 0.7340070 0.6589801 0.5901124 0.5500868 0.4865594
22 more elements ...

$bin.abundance
[1] -17.08166 -16.46619 -16.13033 -15.84028 -15.59074
22 more elements ...

$tagwise.dispersion
[1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508

$coef
[1] 4

$table
             logConc logFC.SeasonR logFC.HydroperiodP.SeasonR LR.statistic
comp0      -9.736436   -0.24995924                 -0.3214735   4.34114504
comp1     -10.739594    0.20037356                 -0.3870300   0.77512680
comp10     -6.341516    0.37480089                 -0.5276106   1.59407277
comp100    -9.940774   -0.06226595                 -0.3360123   2.12228134
comp1000  -11.726085   -0.07140854                  0.1131265   0.05488506
comp10000 -13.750173    0.21120837                  0.5708073   1.99318234
            p.value
comp0     0.1141123
comp1     0.6787086
comp10    0.4506626
comp100   0.3460608
comp1000  0.9729306
comp10000 0.3691356

$coefficients.full
          (Intercept) HydroperiodP     SeasonR HydroperiodP:SeasonR
comp0       -9.620221  0.028007573 -0.17325854          -0.22282846
comp1      -10.774172  0.058157626  0.13888837          -0.26826873
comp10      -6.555230  0.335378111  0.25979218          -0.36571177
comp100     -9.871905  0.009775718 -0.04315947          -0.23290600
comp1000   -11.662643 -0.118031564 -0.04949663           0.07841331
comp10000  -13.805669 -0.272566725  0.14639848           0.39565348

$coefficients.null
          (Intercept) HydroperiodP
comp0       -9.703284  -0.06737999
comp1      -10.701998  -0.07653693
comp10      -6.416913   0.14550436
comp100     -9.893311  -0.09719967
comp1000   -11.687332  -0.07884619
comp10000  -13.727706  -0.04514128

$design.full
  (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
1           1            0       1                    0
2           1            0       1                    0
3           1            0       1                    0
4           1            1       1                    1
5           1            1       1                    1
7 more rows ...

$design.null
  (Intercept) HydroperiodP
1           1            0
2           1            0
3           1            0
4           1            1
5           1            1
7 more rows ...

$dispersion.used
[1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508

$comparison
[1] "SeasonR"              "HydroperiodP:SeasonR"

>  > RemoveSeasonRmHydBySeason = topTags(lrt.coef34, number =25060)
Error in topTags(lrt.coef34, number = 25060) : 
  unused argument(s) (number = 25060)
> ?topTags
> RemoveSeasonRmHydBySeason = topTags(lrt.coef34, n =25060)
> ls()
 [1] "Hydroperiod"               "RemoveSeasonRmHydBySeason"
 [3] "Season"                    "d"                        
 [5] "d.GLM"                     "design"                   
 [7] "glmfit.tgw"                "lrt.coef34"               
 [9] "lrt.coef4"                 "lrt.tgw"                  
[11] "oo"                        "raw.data"                 
> RemoveHydBySeason = topTags(lrt.coef4, n=25060)
> write.csv(RemoveHydBySeason, "RemoveHydBySeason.csv")
> write.csv(RemoveSeasonRmHydBySeason, "RemoveSeasonRmHydBySeason.csv")
> summary(decideTestsDGE(lrt.coef4)
+ 
> summary(decideTestsDGE(lrt.coef4))
   [,1] 
-1    57
0  24887
1    116
> summary(decideTestsDGE(lrt.coef34))
Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  : 
  attempt to set an attribute on NULL
> design
   (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
1            1            0       1                    0
2            1            0       1                    0
3            1            0       1                    0
4            1            1       1                    1
5            1            1       1                    1
6            1            1       1                    1
7            1            0       0                    0
8            1            0       0                    0
9            1            0       0                    0
10           1            1       0                    0
11           1            1       0                    0
12           1            1       0                    0
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Hydroperiod
[1] "contr.treatment"

attr(,"contrasts")$Season
[1] "contr.treatment"



More information about the Bioconductor mailing list