[BioC] edgeR - R script - results compared to DESeq

Gordon K Smyth smyth at wehi.EDU.AU
Fri Dec 2 00:57:02 CET 2011


Hi Hilary,

> Date: Wed, 30 Nov 2011 08:30:36 -0500 (EST)
> From: "Smith, Hilary A" <hilary.smith at gatech.edu>
> To: bioconductor at r-project.org
> Subject: [BioC] edgeR - R script - results compared to DESeq
>
> In case it helps the discussion ... I also tried running GLMs in both 
> DESeq and edgeR. I likewise found that edgeR yielded more differentially 
> expressed tags or genes. I know Dr. Gordon Smyth mentioned 
> calcNormFactors and tagwise dispersion; I did use both of these options.
>
> If it helps, an abstract description of the model comparison used in 
> both programs is below (and below that if helpful, full code for edgeR 
> -- I am using the newest release). I assume the differences are coming 
> from the ways DESeq and edgeR estimate dispersion, but I'm eager to 
> learn more about the rationale (especially given I just started using 
> R/Bioconductor a few weeks ago) and note the results below in case they 
> are of use in identifying the differences. I am glad to hear this is not 
> just a factor of my dataset and is a common feature to have edgeR find 
> more genes.

The rationale of the edgeR package is explained in:

   http://bioinformatics.oxfordjournals.org/content/23/21/2881

The p-values from edgeR are very slightly liberal because it treats the 
dispersions as known in the testing procedure.  I think other packages are 
also subject this same assumption however.

> My models for main effects A and B (with 3 biological reps. each) and their interaction are:
> Full: (A + B + A:B)
> Reduced1: (A + B)
> Reduced 2: (A)
>
> The comparison of the Full vs. Reduced1 yields the genes impacted by the 
> interaction term A:B. To obtain genes impacted by the main effect A, I 
> perform a comparison of Full vs. Reduced1 and a comparison of Full vs. 
> Reduced 2 -- those genes found at P.adj<0.05 in the Full vs. Reduced 2 
> comparison but not found in Full vs. Reduced 1 are what I am noting as 
> genes impacted by main effect A. (To the best of my knowledge I cannot 
> simply drop term A and leave in the interaction term, so this is my 
> attempt to isolate term A. If there's a better way to do this, I'd be 
> glad to know.).

You are correct that you can't remove A and keep A:B.  However many 
statisticians (John Nelder for example) have argued that the concept of a 
main effect for A is not meaningful in the presence of a nonzero 
interaction A:B.  If A:B is nonzero, then A can actually be interpretted 
as having two different main effects, one for each level of B.  To find 
genes responding to A in either level of B, you would compare the Full 
model with just B.  This would test for an A effect in either or both 
levels of B.

Alternatively, and this is what I usually recommend, you can perform a 
separate test for A within each level of B.  To do this, you use glmFit to 
fit the model (B + A:B), which will have four parameters if A and B each 
have two levels.  This is actually the same model as your Full model but 
parametrized differently.  Then

   glmLRT(d,fit,coef=3)

tests for an A effect when B is at its first level, and

   glmLRT(d,fit,coef=4)

tests for A when B is at its second level.

BTW, you can perform these same tests after fitting A+B+A:B using the 
contrast argument of glmLRT, but reparametrizing to B+A:B makes it 
simpler.

Best wishes
Gordon

> My results for the interaction term are:
> edgeR: 173 genes
> DESeq: 38 genes
>
>
> For the main effect A:
> edgeR: 261 genes
> DESeq: 61 genes

> **NOTE: For this comparison of term A, of the 61 genes found by DESeq, 
> about 44 (or ~72%) were also found by edgeR.
>
> I did have warnings in running DESeq that the Full model GLM didn't 
> converge which is disconcerting... edgeR didn't give these warnings but 
> still found more components.
>
> Best,
> Hilary
>
>
> ~~In the code below, main effect "A" above is "Season," and main effect "B" above is "Hydroperiod."
>
>> library(edgeR)
>> library(limma)
>> raw.data = read.csv("2011.11.14counts.csv", header=TRUE, stringsAsFactors=FALSE)
>> d = raw.data[,2:13]
>> rownames(d) = raw.data[,1]
>> head(d)
>          X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F
> comp0      1159  1572  1817   605  1113  1732  1065  1207  1477  1841  1915
> comp1       534   675   739   236   451   799   544   341   333   690   502
> comp10    37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300
>
> comp100    1065  1332  1620   658   861  1370  1060   999   918  1697  1117
> comp1000    157   266   247   135   188   244   130   229   141   263   182
> comp10000    14    37    47    17    21    64    35    15    10    28    22
>          X3P_F
> comp0      1645
> comp1       571
> comp10    44575
> comp100    1336
> comp1000    168
> comp10000    12
>
>> Hydroperiod = factor(c("E", "E", "E", "P", "P", "P", "E", "E", "E", "P", "P", "P"))
>> Season = factor(c("R", "R", "R", "R", "R", "R", "F", "F", "F", "F", "F", "F"))
>> design = model.matrix(~Hydroperiod + Season + Hydroperiod:Season)
>> design
>   (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
> 1            1            0       1                    0
> 2            1            0       1                    0
> 3            1            0       1                    0
> 4            1            1       1                    1
> 5            1            1       1                    1
> 6            1            1       1                    1
> 7            1            0       0                    0
> 8            1            0       0                    0
> 9            1            0       0                    0
> 10           1            1       0                    0
> 11           1            1       0                    0
> 12           1            1       0                    0
> attr(,"assign")
> [1] 0 1 2 3
> attr(,"contrasts")
> attr(,"contrasts")$Hydroperiod
> [1] "contr.treatment"
>
> attr(,"contrasts")$Season
> [1] "contr.treatment"
>
>> d.GLM = DGEList(d, group = c("ER", "ER", "ER", "PR", "PR", "PR", "EF", "EF", "EF", "PF", "PF", "PF"))
> Calculating library sizes from column totals.
>> d.GLM = calcNormFactors(d.GLM)
>> d.GLM
> An object of class "DGEList"
> $samples
>      group lib.size norm.factors
> X1E_R    ER 23295633    0.9559226
> X2E_R    ER 25882545    1.1040337
> X3E_R    ER 29401480    1.0236513
> X1P_R    PR 20877015    0.8199915
> X2P_R    PR 26649613    0.8869479
> 7 more rows ...
>
> $counts
>         X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F
> comp0     1159  1572  1817   605  1113  1732  1065  1207  1477  1841  1915
> comp1      534   675   739   236   451   799   544   341   333   690   502
> comp10   37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300
> comp100   1065  1332  1620   658   861  1370  1060   999   918  1697  1117
> comp1000   157   266   247   135   188   244   130   229   141   263   182
>         X3P_F
> comp0     1645
> comp1      571
> comp10   44575
>
> comp100   1336
> comp1000   168
> 25055 more rows ...
>
> $all.zeros
>   comp0    comp1   comp10  comp100 comp1000
>   FALSE    FALSE    FALSE    FALSE    FALSE
> 25055 more elements ...
>
>> nrow(d.GLM)
> [1] 25060
>> dim(d.GLM)
> [1] 25060    12
>
>
>> design
>   (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
> 1            1            0       1                    0
> 2            1            0       1                    0
> 3            1            0       1                    0
> 4            1            1       1                    1
> 5            1            1       1                    1
> 6            1            1       1                    1
> 7            1            0       0                    0
> 8            1            0       0                    0
> 9            1            0       0                    0
> 10           1            1       0                    0
> 11           1            1       0                    0
> 12           1            1       0                    0
> attr(,"assign")
> [1] 0 1 2 3
> attr(,"contrasts")
> attr(,"contrasts")$Hydroperiod
> [1] "contr.treatment"
>
> attr(,"contrasts")$Season
> [1] "contr.treatment"
>
>> d.GLM = estimateGLMCommonDisp(d.GLM, design)
>> names(d.GLM)
> [1] "samples"           "counts"            "all.zeros"
> [4] "common.dispersion"
>> d.GLM$common.dispersion
> [1] 0.1488192
>> sqrt(d.GLM$common.dispersion)
> [1] 0.3857709
>> # 0.3857709 is the Coefficient of Biological Variation
>> d.GLM = estimateGLMTrendedDisp(d.GLM, design)
> Loading required package: splines
>> summary(d.GLM$trended.dispersion)
>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> 0.07541 0.10240 0.19030 0.25500 0.37530 1.26900
>> d.GLM = estimateGLMTagwiseDisp(d.GLM, design)
>> d.GLM$prior.n
> NULL
>> d$prior.n
> NULL
>> ls()
> [1] "Hydroperiod" "Season"      "d"           "d.GLM"       "design"
> [6] "raw.data"
>> d.GLM
> An object of class "DGEList"
> $samples
>      group lib.size norm.factors
> X1E_R    ER 23295633    0.9559226
> X2E_R    ER 25882545    1.1040337
> X3E_R    ER 29401480    1.0236513
>
> X1P_R    PR 20877015    0.8199915
> X2P_R    PR 26649613    0.8869479
> 7 more rows ...
>
> $counts
>         X1E_R X2E_R X3E_R X1P_R X2P_R X3P_R X1E_F X2E_F X3E_F X1P_F X2P_F
> comp0     1159  1572  1817   605  1113  1732  1065  1207  1477  1841  1915
> comp1      534   675   739   236   451   799   544   341   333   690   502
> comp10   37677 54466 58271 34712 40312 51243 30423 28044 23961 53852 59300
> comp100   1065  1332  1620   658   861  1370  1060   999   918  1697  1117
> comp1000   157   266   247   135   188   244   130   229   141   263   182
>         X3P_F
> comp0     1645
> comp1      571
> comp10   44575
> comp100   1336
> comp1000   168
> 25055 more rows ...
>
> $all.zeros
>   comp0    comp1   comp10  comp100 comp1000
>   FALSE    FALSE    FALSE    FALSE    FALSE
> 25055 more elements ...
>
> $common.dispersion
> [1] 0.1488192
>
> $trended.dispersion
> [1] 0.07619372 0.08679353 0.10444478 0.07739597 0.10629516
> 25055 more elements ...
>
> $abundance
>     comp0      comp1     comp10    comp100   comp1000
> -9.737514 -10.720757  -6.331670  -9.937984 -11.724981
> 25055 more elements ...
>
> $bin.dispersion
> [1] 0.7340070 0.6589801 0.5901124 0.5500868 0.4865594
> 22 more elements ...
>
> $bin.abundance
> [1] -17.08166 -16.46619 -16.13033 -15.84028 -15.59074
> 22 more elements ...
>
> $tagwise.dispersion
> [1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404
> 25055 more elements ...
>
>> ?getPriorN
>> getPriorN(d.GLM, design=design)
> [1] 2.5
>> head(d.GLM$tagwise.dispersion)
> [1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508
>> summary(d.GLM$tagwise.dispersion)
>   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> 0.05139 0.09363 0.18700 0.25910 0.35480 1.71800
>> glmfit.tgw = glmFit(d.GLM, design, dispersion=d.GLM$tagwise.dispersion)
>> lrt.tgw = glmLRT(d.GLM, glmfit.tgw)
>> topTags(lrt.tgw)
> Coefficient:  HydroperiodP:SeasonR
>             logConc        logFC       LR      P.Value          FDR
> comp13665 -10.932729 1.068620e+01 59.37370 1.304000e-14 3.267824e-10
> comp15478 -12.077538 7.742379e+00 47.67249 5.037112e-12 5.552272e-08
> comp370   -13.588446 1.442695e+08 46.86820 7.592458e-12 5.552272e-08
>
>
> comp10848 -11.836655 8.970575e+00 46.56512 8.862366e-12 5.552272e-08
> comp13403  -9.315242 3.773345e+00 44.92234 2.050057e-11 1.027488e-07
> comp7180  -11.518762 7.869625e+00 43.18096 4.990399e-11 2.084323e-07
> comp2502  -10.479763 5.723705e+00 41.56745 1.138735e-10 4.076673e-07
> comp2740  -10.814075 4.666231e+00 38.00241 7.065735e-10 2.213342e-06
> comp4314  -13.104853 4.837400e+00 35.44570 2.622602e-09 7.302491e-06
> comp13675 -12.930253 4.442508e+00 34.46100 4.348772e-09 1.089802e-05
>> summary(decideTestsDGE(lrt.tgw))
>   [,1]
> -1    57
> 0  24887
> 1    116
>
>
>> design
>   (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
> 1            1            0       1                    0
> 2            1            0       1                    0
> 3            1            0       1                    0
> 4            1            1       1                    1
> 5            1            1       1                    1
> 6            1            1       1                    1
> 7            1            0       0                    0
> 8            1            0       0                    0
> 9            1            0       0                    0
> 10           1            1       0                    0
> 11           1            1       0                    0
> 12           1            1       0                    0
> attr(,"assign")
> [1] 0 1 2 3
> attr(,"contrasts")
> attr(,"contrasts")$Hydroperiod
> [1] "contr.treatment"
>
> attr(,"contrasts")$Season
> [1] "contr.treatment"
>
>> lrt.coef4 = glmLRT(d.GLM, glmfit.tgw, coef=4)
>> topTags(lrt.coef4)
> Coefficient:  HydroperiodP:SeasonR
>             logConc        logFC       LR      P.Value          FDR
> comp13665 -10.932729 1.068620e+01 59.37370 1.304000e-14 3.267824e-10
> comp15478 -12.077538 7.742379e+00 47.67249 5.037112e-12 5.552272e-08
> comp370   -13.588446 1.442695e+08 46.86820 7.592458e-12 5.552272e-08
> comp10848 -11.836655 8.970575e+00 46.56512 8.862366e-12 5.552272e-08
> comp13403  -9.315242 3.773345e+00 44.92234 2.050057e-11 1.027488e-07
> comp7180  -11.518762 7.869625e+00 43.18096 4.990399e-11 2.084323e-07
> comp2502  -10.479763 5.723705e+00 41.56745 1.138735e-10 4.076673e-07
> comp2740  -10.814075 4.666231e+00 38.00241 7.065735e-10 2.213342e-06
> comp4314  -13.104853 4.837400e+00 35.44570 2.622602e-09 7.302491e-06
> comp13675 -12.930253 4.442508e+00 34.46100 4.348772e-09 1.089802e-05
>> lrt.coef34 = glmLRT(d.GLM, glmfit.tgw, coef=3:4)
>> topTags(lrt.coef34)
> Coefficient:  SeasonR HydroperiodP:SeasonR
>            logConc       SeasonR HydroperiodP.SeasonR       LR      P.Value
> comp11779 -13.60929  1.442695e+08        -1.442695e+08 88.20296 7.030231e-20
> comp21414 -13.64545  5.616807e+00         1.442695e+08 71.78587 2.581642e-16
> comp6411  -10.57883  1.671168e+00         2.006498e+00 67.75411 1.938124e-15
> comp6417  -10.10518  1.510699e+00         1.224445e+00 65.09872 7.311274e-15
> comp13665 -10.93273 -2.193560e+00         1.068620e+01 62.70305 2.422182e-14
> comp1872  -12.53141  3.893662e+00        -2.007081e+00 62.49057 2.693670e-14
> comp5005  -11.13142  4.565629e-01         3.063432e+00 61.87012 3.673456e-14
> comp15150 -13.28156  5.032869e+00         1.442695e+08 58.96222 1.572234e-13
> comp2502  -10.47976 -4.007870e-01         5.723705e+00 56.89575 4.418199e-13
> comp19402 -11.60722  5.375493e+00        -5.180038e+00 52.82636 3.379876e-12
>                   FDR
> comp11779 1.761776e-15
> comp21414 3.234797e-12
> comp6411  1.618980e-11
> comp6417  4.580513e-11
> comp13665 1.125056e-10
> comp1872  1.125056e-10
> comp5005  1.315097e-10
> comp15150 4.925022e-10
> comp2502  1.230223e-09
> comp19402 8.469969e-09
>> ls()
> [1] "Hydroperiod" "Season"      "d"           "d.GLM"       "design"
>
>
> [6] "glmfit.tgw"  "lrt.coef34"  "lrt.coef4"   "lrt.tgw"     "oo"
> [11] "raw.data"
>> head(lrt.coef34)
> An object of class "DGELRT"
> $samples
>      group lib.size norm.factors
> X1E_R    ER 23295633    0.9559226
> X2E_R    ER 25882545    1.1040337
> X3E_R    ER 29401480    1.0236513
> X1P_R    PR 20877015    0.8199915
> X2P_R    PR 26649613    0.8869479
> 7 more rows ...
>
> $all.zeros
>    comp0     comp1    comp10   comp100  comp1000 comp10000
>    FALSE     FALSE     FALSE     FALSE     FALSE     FALSE
>
> $common.dispersion
> [1] 0.1488192
>
> $trended.dispersion
> [1] 0.07619372 0.08679353 0.10444478 0.07739597 0.10629516 0.20365547
>
> $abundance
>     comp0      comp1     comp10    comp100   comp1000  comp10000
> -9.737514 -10.720757  -6.331670  -9.937984 -11.724981 -13.712600
>
> $bin.dispersion
> [1] 0.7340070 0.6589801 0.5901124 0.5500868 0.4865594
> 22 more elements ...
>
> $bin.abundance
> [1] -17.08166 -16.46619 -16.13033 -15.84028 -15.59074
> 22 more elements ...
>
> $tagwise.dispersion
> [1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508
>
> $coef
> [1] 4
>
> $table
>             logConc logFC.SeasonR logFC.HydroperiodP.SeasonR LR.statistic
> comp0      -9.736436   -0.24995924                 -0.3214735   4.34114504
> comp1     -10.739594    0.20037356                 -0.3870300   0.77512680
> comp10     -6.341516    0.37480089                 -0.5276106   1.59407277
> comp100    -9.940774   -0.06226595                 -0.3360123   2.12228134
> comp1000  -11.726085   -0.07140854                  0.1131265   0.05488506
> comp10000 -13.750173    0.21120837                  0.5708073   1.99318234
>            p.value
> comp0     0.1141123
> comp1     0.6787086
> comp10    0.4506626
> comp100   0.3460608
> comp1000  0.9729306
> comp10000 0.3691356
>
> $coefficients.full
>          (Intercept) HydroperiodP     SeasonR HydroperiodP:SeasonR
> comp0       -9.620221  0.028007573 -0.17325854          -0.22282846
> comp1      -10.774172  0.058157626  0.13888837          -0.26826873
> comp10      -6.555230  0.335378111  0.25979218          -0.36571177
> comp100     -9.871905  0.009775718 -0.04315947          -0.23290600
> comp1000   -11.662643 -0.118031564 -0.04949663           0.07841331
> comp10000  -13.805669 -0.272566725  0.14639848           0.39565348
>
> $coefficients.null
>          (Intercept) HydroperiodP
> comp0       -9.703284  -0.06737999
> comp1      -10.701998  -0.07653693
> comp10      -6.416913   0.14550436
> comp100     -9.893311  -0.09719967
> comp1000   -11.687332  -0.07884619
> comp10000  -13.727706  -0.04514128
>
> $design.full
>  (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
> 1           1            0       1                    0
> 2           1            0       1                    0
> 3           1            0       1                    0
> 4           1            1       1                    1
> 5           1            1       1                    1
> 7 more rows ...
>
> $design.null
>  (Intercept) HydroperiodP
> 1           1            0
> 2           1            0
> 3           1            0
> 4           1            1
> 5           1            1
> 7 more rows ...
>
> $dispersion.used
> [1] 0.06348981 0.06769437 0.07385569 0.05410498 0.08450404 0.19604508
>
> $comparison
> [1] "SeasonR"              "HydroperiodP:SeasonR"
>
>> > RemoveSeasonRmHydBySeason = topTags(lrt.coef34, number =25060)
> Error in topTags(lrt.coef34, number = 25060) :
>  unused argument(s) (number = 25060)
>> ?topTags
>> RemoveSeasonRmHydBySeason = topTags(lrt.coef34, n =25060)
>> ls()
> [1] "Hydroperiod"               "RemoveSeasonRmHydBySeason"
> [3] "Season"                    "d"
> [5] "d.GLM"                     "design"
> [7] "glmfit.tgw"                "lrt.coef34"
> [9] "lrt.coef4"                 "lrt.tgw"
> [11] "oo"                        "raw.data"
>> RemoveHydBySeason = topTags(lrt.coef4, n=25060)
>> write.csv(RemoveHydBySeason, "RemoveHydBySeason.csv")
>> write.csv(RemoveSeasonRmHydBySeason, "RemoveSeasonRmHydBySeason.csv")
>> summary(decideTestsDGE(lrt.coef4)
> +
>> summary(decideTestsDGE(lrt.coef4))
>   [,1]
> -1    57
> 0  24887
> 1    116
>> summary(decideTestsDGE(lrt.coef34))
> Error in array(x, c(length(x), 1L), if (!is.null(names(x))) list(names(x),  :
>  attempt to set an attribute on NULL
>> design
>   (Intercept) HydroperiodP SeasonR HydroperiodP:SeasonR
> 1            1            0       1                    0
> 2            1            0       1                    0
> 3            1            0       1                    0
> 4            1            1       1                    1
> 5            1            1       1                    1
> 6            1            1       1                    1
> 7            1            0       0                    0
> 8            1            0       0                    0
> 9            1            0       0                    0
> 10           1            1       0                    0
> 11           1            1       0                    0
> 12           1            1       0                    0
> attr(,"assign")
> [1] 0 1 2 3
> attr(,"contrasts")
> attr(,"contrasts")$Hydroperiod
> [1] "contr.treatment"
>
> attr(,"contrasts")$Season
> [1] "contr.treatment"

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list