[BioC] EdgeR multi-factor design questions

James W. MacDonald jmacdon at uw.edu
Wed Mar 27 15:39:56 CET 2013


Hi Morgan,

On 3/26/2013 4:40 PM, Morgan Mouchka wrote:
> Hi Jim,
>
> On Mar 25, 2013, at 4:01 PM, James W. MacDonald wrote:
>
>> Hi Morgan,
>>
>> On 3/25/2013 3:15 PM, Morgan Mouchka wrote:
>>> Hello all,
>>>
>>>   I’m currently using EdgeR to analyze RNASeq data and have very much appreciated the software and its incredibly helpful user manual.
>>>
>>>
>>> More specifically, I have two questions.  The first is whether I have chosen the correct method for analyzing my multi-factorial experiment.  I have 2 factors, state (apo, sym) and treatment (control, treatment).  I’m interested if there is a differential response due to the treatment that varies by state.  In other words, do apos respond to the treatment in the same way that syms respond?
>>>
>>>
>>> To determine which genes were differentially expressed between the control and treatment for each data set (i.e. apo and sym), I used a GLM approach with contrasts to do pairwise comparisons between groups.  I have pasted my session output and preliminary code below.  Specifically, I could use advice on whether 1) my design matrix and 2) the way I have called my contrasts are appropriate for my question.
>>>
>>> Second, for genes that are differentially expressed in both the apo and sym data sets, is it possible to test if the genes are significantly differentially expressed from each other?  For instance, if a gene is 9-fold up regulated in apos, but only 3-fold up regulated in syms, are these significantly different such that I could say that this gene is significantly more up regulated in the apo state?  It seems that this could be some sort of interaction term, but I’m uncertain as to the best way to set this up.
>> This is definitely an interaction term. How you set it up is dependent on what parameterization makes the most sense to you. Following your method below, you want
>>
>> glmLRT(fit, contrast = c(1,-1,-1,1))
> I'm a bit confused, but wouldn't this model be combining the controls (apo and sym) and comparing it to the treatments (apo and sym) in an additive fashion?  When I run this model, I get different results than when I run your second model below.  Are they suppose to provide the same output or did I misunderstand your post?  (Certainly likely, I'm very new to GLM and edgeR).

This contrast is the same as (Sym trt - Sym ctrl) - (Apo trt - Apo 
ctrl), which tests to see if the difference between treatment and 
control varies according to state. This is the same as the coefficient 
treat2:state2 in the alternative parameterization below.

But two things here. First, I had a brain cramp in my earlier email to 
you. The correct interpretations of the coefficients are

Apo ctrl
Apo trt - Apo ctrl
Sym ctrl - Apo ctrl
(Sym trt - Sym ctrl) - (Apo trt - Apo ctrl)

So coef = 3 doesn't give you Sym trt - Sym ctrl. It is more difficult to 
get that contrast from this parameterization, so sticking with what you 
already had is the easiest way to go.

Second, I don't see in your code below where you re-fit the model using 
the alternative parameterization, so I tried this with some data I have 
in hand. Note that you have to go all the way back to the beginning, 
recreating the dgeList object with the different design matrix. So, 
using your parameterization I get

 > topTags(glmLRT(fit, contrast = c(1,-1,-1,1)))

symbol egids logFC logCPM LR PValue FDR
319317 Snhg11 319317 -8.012675 8.499244 333.5330 1.634236e-74 2.088554e-70
21938 Tnfrsf1b 21938 6.895792 4.632327 268.2860 2.682053e-60 1.713832e-56
16365 Irg1 16365 7.709183 4.697518 248.7724 4.809099e-56 2.048676e-52
18133 Nov 18133 6.798803 6.007596 237.2266 1.582884e-53 5.057313e-50
13058 Cybb 13058 7.395509 5.371628 233.3577 1.104324e-52 2.822652e-49
20302 Ccl3 20302 7.776920 4.644823 220.7970 6.060828e-50 1.290956e-46
22671 Rnf112 22671 -6.945110 6.711583 217.2831 3.540102e-49 6.463215e-46
13733 Emr1 13733 6.855894 5.131320 210.6611 9.854565e-48 1.574267e-44
20266 Scn1b 20266 -5.007392 7.965721 206.7374 7.074398e-47 1.004565e-43
110834 Chrna3 110834 6.344460 4.341593 206.1794 9.363937e-47 1.196711e-43

and then re-parameterizing

 > treat <- factor(rep(c(1,2,1,2), c(5,3,1,3)))
 > type <- factor(rep(c(1,2,1,2), c(3,2,4,3)))
 > design2 <- model.matrix(~treat*type)
 > dgelst2 <- estimateGLMCommonDisp(DGEList(counts=cnts, genes=tx.filt), 
design2)
Calculating library sizes from column totals.
 > dgelst2 <- estimateGLMTrendedDisp(dgelst2, design2)
 > dgelst2 <- estimateGLMTagwiseDisp(dgelst2, design2)
 > fit2 <- glmFit(dgelst2, design2)
 > topTags(glmLRT(fit2, coef=4))
Coefficient: treat2:type2
symbol egids logFC logCPM LR PValue FDR
319317 Snhg11 319317 -7.869265 8.428033 388.4536 1.796893e-86 2.296429e-82
13058 Cybb 13058 7.824446 5.392286 319.0685 2.310988e-71 1.476721e-67
16365 Irg1 16365 7.773614 4.714316 249.1532 3.972389e-56 1.692238e-52
13733 Emr1 13733 7.444357 5.144043 246.3510 1.621694e-55 4.197947e-52
21938 Tnfrsf1b 21938 6.915307 4.651340 246.3258 1.642389e-55 4.197947e-52
20266 Scn1b 20266 -4.879057 7.909679 238.6288 7.828831e-54 1.667541e-50
18133 Nov 18133 6.871500 6.026197 237.6275 1.294259e-53 2.362947e-50
20302 Ccl3 20302 7.784824 4.663569 209.2621 1.990055e-47 3.179113e-44
100039795 Ildr2 100039795 4.813985 6.396443 199.6276 2.518196e-45 
3.575838e-42
110834 Chrna3 110834 6.382514 4.361077 196.0642 1.509239e-44 1.928808e-41

So you can see that the likelihood ratios are the same - the only 
difference here is likely due to the fact that we are estimating the 
dispersions on different coefficients (e.g., the other coefficients in 
the model), so the denominator of the statistic will change, resulting 
in slightly different p-values and ordering. But all in all, very 
similar results.

Best,

Jim


>> or you could do
>>
>> state<- factor(rep(1:2, each = 8))
>> treat<- factor(rep(1:2, each=4, times=2))
>>
>> design<- model.matrix(~treat*state)
>>
>> and then
>>
>> glmLRT(fit, coef=4)
>>
>> will test the interaction, and coefficients 2 and 3 test the main effects for treatment and state, as you have done below.
> When I use coefficients 2 and 3 in the above model, I get different results from when I compare A and B (apo control to apo treatment) and C and D (sym control to sym treatment) in my original model.  Should they be the same? I have posted my most recent session below using topTags to demonstrate the differing results.  I ran my model first and then ran your recommended models, followed by your model with coefficients 2 and 3.  Hopefully this all makes sense?
> Thanks so much,
> Morgan
>
>> library(edgeR)
>> #reading the counts from a file
>> x<- read.delim("AC_AT_SC_ST_counts_cnidaln.txt", row.names="Contig")
>> head (x)
>                                                            AC3  AC4  AC5  AC7  AT1  AT2
> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792     72   39   55   31   36   55
> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409     3    2    3    4    0    2
> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 1260  576  957  529  663  961
> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436    17    7    6    6    7    8
> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090      0    0    0    0    0    0
> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499  7774 3857 5588 3835 3633 4470
>                                                            AT4  AT5  SC1  SC2  SC3  SC5
> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792     22   63   20   55   32   49
> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409     0    5    4    4    1    2
> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519  516  903  407 1072  345  788
> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436     4    5    3    0    2    5
> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090      0    0    0    0    0    0
> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499  3668 5489 1092 2288 1381 2194
>                                                            ST1  ST2  ST3  ST5
> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792     39   65   40   82
> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409     7    0    1    3
> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519  958 1103  900 1488
> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436     7   11    6    8
> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090      0    0    0    0
> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499  3629 4837 3383 4221
>
> #MY MODEL
>
>> #assign samples to groups, A=AC, B=AT, C=SC, D=ST
>> group<- factor(c("A","A","A","A","B","B","B","B","C","C","C","C","D","D","D","D"))
>> # Define data object (list-based, includes info for counts and info about samples) and will calculate library size
>> y<- DGEList (counts = x, group = group)
> Calculating library sizes from column totals.
>> #filter out very lowly expressed genes, based on number of lowest number of reps (4 in this case)
>> keep<- rowSums (cpm(y)>1)>= 4
>> y2<- y[keep,]
>> dim(y2)
> [1] 17739    16
>> y2$samples$lib.size<- colSums(y2$counts)
>> # Normalize for sample-specific effects
>> y3<- calcNormFactors (y2)
>> y3$samples
>      group lib.size norm.factors
> AC3     A 11095093    1.0237872
> AC4     A  5604841    0.9912734
> AC5     A  7517691    0.9824405
> AC7     A  5918039    0.9919439
> AT1     B  6004830    0.9609672
> AT2     B  7692419    0.9543765
> AT4     B  5197815    0.9575609
> AT5     B  7759281    0.9643070
> SC1     C  3542875    1.0693587
> SC2     C  6720709    1.0488297
> SC3     C  3881774    1.0022528
> SC5     C  5109584    1.0429925
> ST1     D 10429666    0.9815467
> ST2     D  9662842    1.0103866
> ST3     D  8267838    1.0198095
> ST5     D 10698815    1.0069064
>> design<- model.matrix(~0+group, data=y3$samples)
>> colnames(design)<- c("A", "B", "C", "D")
>> design
>      A B C D
> AC3 1 0 0 0
> AC4 1 0 0 0
> AC5 1 0 0 0
> AC7 1 0 0 0
> AT1 0 1 0 0
> AT2 0 1 0 0
> AT4 0 1 0 0
> AT5 0 1 0 0
> SC1 0 0 1 0
> SC2 0 0 1 0
> SC3 0 0 1 0
> SC5 0 0 1 0
> ST1 0 0 0 1
> ST2 0 0 0 1
> ST3 0 0 0 1
> ST5 0 0 0 1
> attr(,"assign")
> [1] 1 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$group
> [1] "contr.treatment"
>
>> y4<- estimateGLMCommonDisp(y3, design, verbose=TRUE)
> Disp = 0.02061 , BCV = 0.1436
>> #estimate gene-wise dispersion
>> y5<- estimateGLMTrendedDisp(y4, design)
> Loading required package: splines
>> y6<- estimateGLMTagwiseDisp(y5,design)
>> fit<- glmFit(y6, design)
>> my.contrasts<- makeContrasts(BvsA=B-A, DvsC=D-C, levels=design)
>> #pairwise comparison of B and A
>> lrt.BvsA<- glmLRT(fit, contrast=my.contrasts[,"BvsA"])
>> #table of top DE tags
>> topTags(lrt.BvsA)
> Coefficient:  -1*A 1*B
>                                                              logFC   logCPM       LR
> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866  4.506598 8.226827 461.5764
> Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 4.620617 7.708649 442.6438
> Locus_15721_Transcript_1/1_Confidence_1.000_Length_705   4.281604 4.374277 339.8684
> Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 2.661941 7.444760 280.1675
> Locus_18344_Transcript_1/1_Confidence_0.000_Length_3840  2.227577 6.807869 256.4828
> Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793  2.451201 9.549261 252.1390
> Locus_16729_Transcript_1/1_Confidence_1.000_Length_3099  2.119708 7.827086 246.6841
> Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724  2.080116 6.409451 246.1767
> Locus_21357_Transcript_1/1_Confidence_1.000_Length_4963  2.124257 8.493526 244.0451
> Locus_21472_Transcript_1/1_Confidence_0.000_Length_3818  3.282939 6.540550 239.7896
>                                                                  PValue          FDR
> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866  2.181936e-102 3.870537e-98
> Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907  2.877748e-98 2.552418e-94
> Locus_15721_Transcript_1/1_Confidence_1.000_Length_705    6.816117e-76 4.030370e-72
> Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372  6.903808e-63 3.061666e-59
> Locus_18344_Transcript_1/1_Confidence_0.000_Length_3840   1.002773e-57 3.557638e-54
> Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793   8.874275e-57 2.623679e-53
> Locus_16729_Transcript_1/1_Confidence_1.000_Length_3099   1.371971e-55 3.476771e-52
> Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724   1.770015e-55 3.924786e-52
> Locus_21357_Transcript_1/1_Confidence_1.000_Length_4963   5.160772e-55 1.017188e-51
> Locus_21472_Transcript_1/1_Confidence_0.000_Length_3818   4.371068e-54 7.753838e-51
>> lrt.DvsC<- glmLRT(fit, contrast=my.contrasts[,"DvsC"])
>> topTags(lrt.DvsC)
> Coefficient:  -1*C 1*D
>                                                               logFC   logCPM        LR
> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866   2.477714 8.226827 157.20112
> Locus_15861_Transcript_1/1_Confidence_1.000_Length_3425  -1.986510 5.205879 131.37771
> Locus_60846_Transcript_1/1_Confidence_0.000_Length_4676  -1.451476 6.950002 103.49253
> Locus_96758_Transcript_1/1_Confidence_0.000_Length_8096  -1.574898 6.554798  88.60841
> Locus_118088_Transcript_1/1_Confidence_0.000_Length_1895 -2.204284 3.908595  87.97649
> Locus_81760_Transcript_1/1_Confidence_0.000_Length_2455   2.060784 3.603245  80.94718
> Locus_15721_Transcript_1/1_Confidence_1.000_Length_705    2.280971 4.374277  80.10507
> Locus_49952_Transcript_1/1_Confidence_0.000_Length_1588  -2.016184 5.268778  80.01864
> Locus_16439_Transcript_2/3_Confidence_1.000_Length_2889  -2.035882 3.040181  76.14863
> Locus_103898_Transcript_1/1_Confidence_0.000_Length_2826 -1.538410 5.854172  72.23494
>                                                                 PValue          FDR
> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866  4.625963e-36 8.205996e-32
> Locus_15861_Transcript_1/1_Confidence_1.000_Length_3425  2.047045e-30 1.815627e-26
> Locus_60846_Transcript_1/1_Confidence_0.000_Length_4676  2.613781e-24 1.545529e-20
> Locus_96758_Transcript_1/1_Confidence_0.000_Length_8096  4.812378e-21 2.134169e-17
> Locus_118088_Transcript_1/1_Confidence_0.000_Length_1895 6.623716e-21 2.349962e-17
> Locus_81760_Transcript_1/1_Confidence_0.000_Length_2455  2.318323e-19 6.854121e-16
> Locus_15721_Transcript_1/1_Confidence_1.000_Length_705   3.550203e-19 8.224107e-16
> Locus_49952_Transcript_1/1_Confidence_0.000_Length_1588  3.708938e-19 8.224107e-16
> Locus_16439_Transcript_2/3_Confidence_1.000_Length_2889  2.630983e-18 5.185667e-15
> Locus_103898_Transcript_1/1_Confidence_0.000_Length_2826 1.910427e-17 3.388906e-14
>
> #JIM'S 1ST MODEL
>> #interactions effect to test if Sym Treat diff from Apo Treat
>> lrt.interaction<- glmLRT(fit, contrast = c(1,-1,-1,1))
>> topTags(lrt.interaction)
> Coefficient:  1*A -1*B -1*C 1*D
>                                                               logFC   logCPM        LR
> Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 -4.134843 7.708649 220.54182
> Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 -1.899300 7.444760  76.74320
> Locus_122099_Transcript_1/1_Confidence_0.000_Length_1201 -1.867935 6.863793  71.58929
> Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724  -1.547852 6.409451  71.51138
> Locus_87983_Transcript_1/1_Confidence_0.000_Length_3460  -1.489774 6.525675  64.83901
> Locus_51175_Transcript_1/1_Confidence_0.000_Length_2213  -1.546709 7.548864  61.65883
> Locus_101327_Transcript_1/1_Confidence_0.000_Length_7106 -1.360292 8.389845  59.29519
> Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793  -1.606681 9.549261  59.26803
> Locus_58146_Transcript_1/1_Confidence_0.000_Length_2295  -1.274685 7.759520  58.79090
> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866  -2.028884 8.226827  57.76618
>                                                                 PValue          FDR
> Locus_106552_Transcript_1/1_Confidence_0.000_Length_5907 6.889670e-50 1.222159e-45
> Locus_124248_Transcript_1/1_Confidence_0.000_Length_2372 1.946973e-18 1.726868e-14
> Locus_122099_Transcript_1/1_Confidence_0.000_Length_1201 2.649905e-17 1.222497e-13
> Locus_37293_Transcript_1/1_Confidence_0.000_Length_2724  2.756632e-17 1.222497e-13
> Locus_87983_Transcript_1/1_Confidence_0.000_Length_3460  8.127420e-16 2.883446e-12
> Locus_51175_Transcript_1/1_Confidence_0.000_Length_2213  4.084321e-15 1.207530e-11
> Locus_101327_Transcript_1/1_Confidence_0.000_Length_7106 1.357077e-14 3.050973e-11
> Locus_46118_Transcript_1/1_Confidence_0.000_Length_8793  1.375939e-14 3.050973e-11
> Locus_58146_Transcript_1/1_Confidence_0.000_Length_2295  1.753507e-14 3.456161e-11
> Locus_32435_Transcript_1/1_Confidence_0.000_Length_2866  2.952016e-14 5.236581e-11
>
> #ALTERNATIVE
>
>> state<- factor(rep(1:2, each = 8))
>> treat<- factor(rep(1:2, each=4, times=2))
>> design<- model.matrix(~treat*state)
>> design
>     (Intercept) treat2 state2 treat2:state2
> 1            1      0      0             0
> 2            1      0      0             0
> 3            1      0      0             0
> 4            1      0      0             0
> 5            1      1      0             0
> 6            1      1      0             0
> 7            1      1      0             0
> 8            1      1      0             0
> 9            1      0      1             0
> 10           1      0      1             0
> 11           1      0      1             0
> 12           1      0      1             0
> 13           1      1      1             1
> 14           1      1      1             1
> 15           1      1      1             1
> 16           1      1      1             1
> attr(,"assign")
> [1] 0 1 2 3
> attr(,"contrasts")
> attr(,"contrasts")$treat
> [1] "contr.treatment"
>
> attr(,"contrasts")$state
> [1] "contr.treatment"
>
>> lrt.interaction2<- glmLRT(fit, coef=4)
>> topTags(lrt.interaction2)
> Coefficient:  D
>                                                               logFC      logCPM       LR
> Locus_1443_Transcript_1/1_Confidence_0.000_Length_492    -23.55939 -0.05067457 1620.221
> Locus_52010_Transcript_1/1_Confidence_0.000_Length_731   -22.84920 -0.18609176 1955.056
> Locus_113534_Transcript_1/1_Confidence_0.000_Length_1391 -22.37722 -0.88447008 1923.027
> Locus_52006_Transcript_1/1_Confidence_0.000_Length_338   -22.19649 -0.63260884 3087.708
> Locus_124155_Transcript_1/1_Confidence_1.000_Length_648  -22.02352 -0.80276576 2491.409
> Locus_1722_Transcript_1/1_Confidence_0.000_Length_731    -21.87997 -0.75239480 3251.146
> Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672  -21.87383 -0.23507267 1632.186
> Locus_87931_Transcript_1/1_Confidence_0.000_Length_1357  -21.62135 -0.13684358 1905.960
> Locus_99002_Transcript_1/1_Confidence_0.000_Length_964   -21.40488 -0.62850673 2271.751
> Locus_47566_Transcript_1/1_Confidence_0.000_Length_992   -21.40385 -0.62656773 2492.533
>                                                           PValue FDR
> Locus_1443_Transcript_1/1_Confidence_0.000_Length_492         0   0
> Locus_52010_Transcript_1/1_Confidence_0.000_Length_731        0   0
> Locus_113534_Transcript_1/1_Confidence_0.000_Length_1391      0   0
> Locus_52006_Transcript_1/1_Confidence_0.000_Length_338        0   0
> Locus_124155_Transcript_1/1_Confidence_1.000_Length_648       0   0
> Locus_1722_Transcript_1/1_Confidence_0.000_Length_731         0   0
> Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672       0   0
> Locus_87931_Transcript_1/1_Confidence_0.000_Length_1357       0   0
> Locus_99002_Transcript_1/1_Confidence_0.000_Length_964        0   0
> Locus_47566_Transcript_1/1_Confidence_0.000_Length_992        0   0
>
> #JIM'S 1ST MODEL: SAME AS C AND D COMPARISON (SYM CONTROL TO SYM TREATMENT)?
>> lrt.3<- glmLRT(fit, coef=3)
>> topTags(lrt.3)
> Coefficient:  C
>                                                              logFC      logCPM       LR
> Locus_52006_Transcript_1/1_Confidence_0.000_Length_338  -23.20368 -0.63260884 2954.522
> Locus_1443_Transcript_1/1_Confidence_0.000_Length_492   -22.24145 -0.05067457 1529.706
> Locus_16904_Transcript_1/1_Confidence_0.000_Length_7117 -22.23039 -0.57979449 2739.489
> Locus_99002_Transcript_1/1_Confidence_0.000_Length_964  -22.22741 -0.62850673 2193.659
> Locus_34242_Transcript_1/1_Confidence_0.000_Length_290  -21.91166 -0.09022093 2767.658
> Locus_11024_Transcript_1/1_Confidence_0.000_Length_3304 -21.91140 -0.57058617 1714.034
> Locus_50361_Transcript_1/1_Confidence_0.000_Length_899  -21.90846 -0.07570223 3706.103
> Locus_77224_Transcript_1/1_Confidence_0.000_Length_5098 -21.90643 -0.74190749 2140.909
> Locus_8741_Transcript_2/2_Confidence_1.000_Length_380   -21.65265 -0.07156613 1494.994
> Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672 -21.45320 -0.23507267 1558.686
>                                                          PValue FDR
> Locus_52006_Transcript_1/1_Confidence_0.000_Length_338       0   0
> Locus_1443_Transcript_1/1_Confidence_0.000_Length_492        0   0
> Locus_16904_Transcript_1/1_Confidence_0.000_Length_7117      0   0
> Locus_99002_Transcript_1/1_Confidence_0.000_Length_964       0   0
> Locus_34242_Transcript_1/1_Confidence_0.000_Length_290       0   0
> Locus_11024_Transcript_1/1_Confidence_0.000_Length_3304      0   0
> Locus_50361_Transcript_1/1_Confidence_0.000_Length_899       0   0
> Locus_77224_Transcript_1/1_Confidence_0.000_Length_5098      0   0
> Locus_8741_Transcript_2/2_Confidence_1.000_Length_380        0   0
> Locus_60778_Transcript_1/1_Confidence_0.000_Length_1672      0   0
>
> #JIM'S 1ST MODEL: SAME AS A AND B COMPARISON (APO CONTROL AND APO TREATMENT)?
>> lrt.2<- glmLRT(fit, coef=2)
>> topTags(lrt.2)
> Coefficient:  B
>                                                               logFC   logCPM       LR
> Locus_125980_Transcript_1/1_Confidence_1.000_Length_5017 -27.77711 6.430503 3886.142
> Locus_62669_Transcript_1/1_Confidence_0.000_Length_2422  -27.77711 3.945365 2525.716
> Locus_95409_Transcript_1/1_Confidence_0.000_Length_2333  -27.77711 3.041338 2328.365
> Locus_32782_Transcript_1/1_Confidence_0.000_Length_2222  -27.77711 6.168745 2973.233
> Locus_66293_Transcript_1/1_Confidence_0.000_Length_3091  -27.77711 4.657887 2296.824
> Locus_117458_Transcript_1/1_Confidence_1.000_Length_1842 -27.77711 3.300734 2269.282
> Locus_12642_Transcript_1/1_Confidence_1.000_Length_1211  -27.77711 2.771320 2360.788
> Locus_32505_Transcript_1/1_Confidence_1.000_Length_3406  -27.77711 4.930170 3599.471
> Locus_113391_Transcript_1/1_Confidence_0.000_Length_1360 -27.77711 2.112580 1802.150
> Locus_41152_Transcript_1/1_Confidence_0.000_Length_1925  -27.77711 6.014430 2939.592
>                                                           PValue FDR
> Locus_125980_Transcript_1/1_Confidence_1.000_Length_5017      0   0
> Locus_62669_Transcript_1/1_Confidence_0.000_Length_2422       0   0
> Locus_95409_Transcript_1/1_Confidence_0.000_Length_2333       0   0
> Locus_32782_Transcript_1/1_Confidence_0.000_Length_2222       0   0
> Locus_66293_Transcript_1/1_Confidence_0.000_Length_3091       0   0
> Locus_117458_Transcript_1/1_Confidence_1.000_Length_1842      0   0
> Locus_12642_Transcript_1/1_Confidence_1.000_Length_1211       0   0
> Locus_32505_Transcript_1/1_Confidence_1.000_Length_3406       0   0
> Locus_113391_Transcript_1/1_Confidence_0.000_Length_1360      0   0
> Locus_41152_Transcript_1/1_Confidence_0.000_Length_1925       0   0
>> Best,
>>
>> Jim
>>
>>
>>> I’m using the 3.0.8 version of EdgeR and the 2.15.2 version of R.  I have 4 biological reps per treatment/state with the nomenclature of AC for apo control, AT for apo treatment, SC for sym control, and ST for sym treatment.
>>>
>>> Thanks so much,
>>>
>>> Morgan
>>>
>>> Morgan Mouchka
>>> PhD Candidate
>>> E231 Corson Hall
>>> Dept. Ecology and Evolutionary Biology
>>> Cornell University
>>> mep74 at cornell.edu
>>> http://www.eeb.cornell.edu/mouchka
>>>
>>>> x<- read.delim("AC_AT_SC_ST_counts_cnidaln.txt", row.names="Contig")
>>>> head (x)
>>>                                                            AC3  AC4  AC5  AC7  AT1  AT2
>>> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792     72   39   55   31   36   55
>>> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409     3    2    3    4    0    2
>>> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 1260  576  957  529  663  961
>>> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436    17    7    6    6    7    8
>>> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090      0    0    0    0    0    0
>>> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499  7774 3857 5588 3835 3633 4470
>>>                                                            AT4  AT5  SC1  SC2  SC3  SC5
>>> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792     22   63   20   55   32   49
>>> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409     0    5    4    4    1    2
>>> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519  516  903  407 1072  345  788
>>> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436     4    5    3    0    2    5
>>> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090      0    0    0    0    0    0
>>> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499  3668 5489 1092 2288 1381 2194
>>>                                                            ST1  ST2  ST3  ST5
>>> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792     39   65   40   82
>>> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409     7    0    1    3
>>> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519  958 1103  900 1488
>>> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436     7   11    6    8
>>> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090      0    0    0    0
>>> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499  3629 4837 3383 4221
>>>> #assign samples to groups, A=AC, B=AT, C=SC, D=ST
>>>> group<- factor(c("A","A","A","A","B","B","B","B","C","C","C","C","D","D","D","D"))
>>>> # Define data object (list-based, includes info for counts and info about samples) and will calculate library size
>>>> y<- DGEList (counts = x, group = group)
>>> Calculating library sizes from column totals.
>>>> #filter out very lowly expressed genes, based on number of lowest number of reps (4 in this case)
>>>> keep<- rowSums (cpm(y)>1)>= 4
>>>> y2<- y[keep,]
>>>> dim(y2)
>>> [1] 17739    16
>>>> y2$samples$lib.size<- colSums(y2$counts)
>>>> # Normalize for sample-specific effects
>>>> y3<- calcNormFactors (y2)
>>>> y3$samples
>>>      group lib.size norm.factors
>>> AC3     A 11095093    1.0237872
>>> AC4     A  5604841    0.9912734
>>> AC5     A  7517691    0.9824405
>>> AC7     A  5918039    0.9919439
>>> AT1     B  6004830    0.9609672
>>> AT2     B  7692419    0.9543765
>>> AT4     B  5197815    0.9575609
>>> AT5     B  7759281    0.9643070
>>> SC1     C  3542875    1.0693587
>>> SC2     C  6720709    1.0488297
>>> SC3     C  3881774    1.0022528
>>> SC5     C  5109584    1.0429925
>>> ST1     D 10429666    0.9815467
>>> ST2     D  9662842    1.0103866
>>> ST3     D  8267838    1.0198095
>>> ST5     D 10698815    1.0069064
>>>> #plot biological coefficient of variation (BVC)between samples
>>>> plotMDS(y3)
>>>> #check levels
>>>> levels (y3$samples$group)
>>> [1] "A" "B" "C" "D"
>>>> #design matrix to describe treatment conditions
>>>> design<- model.matrix(~0+group, data=y3$samples)
>>>> colnames(design)<- c("A", "B", "C", "D")
>>>> design
>>>      A B C D
>>> AC3 1 0 0 0
>>> AC4 1 0 0 0
>>> AC5 1 0 0 0
>>> AC7 1 0 0 0
>>> AT1 0 1 0 0
>>> AT2 0 1 0 0
>>> AT4 0 1 0 0
>>> AT5 0 1 0 0
>>> SC1 0 0 1 0
>>> SC2 0 0 1 0
>>> SC3 0 0 1 0
>>> SC5 0 0 1 0
>>> ST1 0 0 0 1
>>> ST2 0 0 0 1
>>> ST3 0 0 0 1
>>> ST5 0 0 0 1
>>> attr(,"assign")
>>> [1] 1 1 1 1
>>> attr(,"contrasts")
>>> attr(,"contrasts")$group
>>> [1] "contr.treatment"
>>>
>>>> y4<- estimateGLMCommonDisp(y3, design, verbose=TRUE)
>>> Disp = 0.02061 , BCV = 0.1436
>>>> #estimate gene-wise dispersion
>>>> y5<- estimateGLMTrendedDisp(y4, design)
>>>> y6<- estimateGLMTagwiseDisp(y5,design)
>>>> plotBCV(y6)
>>>> fit<- glmFit(y6, design)
>>>> # make contrasts
>>>> my.contrasts<- makeContrasts(BvsA=B-A, DvsC=D-C, levels=design)
>>>> #pairwise comparison of B and A (apo treatmetn and apo control)
>>>> lrt.BvsA<- glmLRT(fit, contrast=my.contrasts[,"BvsA"])
>>>> # summary of model fitting funtions
>>>> summary(de<-decideTestsDGE(lrt.BvsA))
>>>     [,1]
>>> -1   562
>>> 0  16091
>>> 1   1086
>>>> lrt.DvsC<- glmLRT(fit, contrast=my.contrasts[,"DvsC"])
>>>> summary(de<-decideTestsDGE(lrt.DvsC))
>>>     [,1]
>>> -1   496
>>> 0  16710
>>> 1    533
>>> 	[[alternative HTML version deleted]]
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> -- 
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
>>
> Morgan Mouchka
> PhD Candidate
> E231 Corson Hall
> Dept. Ecology and Evolutionary Biology
> Cornell University
> mep74 at cornell.edu
> http://www.eeb.cornell.edu/mouchka
>
>
>
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list