[BioC] EdgeR multi-factor design questions

James W. MacDonald jmacdon at uw.edu
Mon Mar 25 21:01:24 CET 2013


Hi Morgan,

On 3/25/2013 3:15 PM, Morgan Mouchka wrote:
> Hello all,
>
>   I’m currently using EdgeR to analyze RNASeq data and have very much appreciated the software and its incredibly helpful user manual.
>
>
> More specifically, I have two questions.  The first is whether I have chosen the correct method for analyzing my multi-factorial experiment.  I have 2 factors, state (apo, sym) and treatment (control, treatment).  I’m interested if there is a differential response due to the treatment that varies by state.  In other words, do apos respond to the treatment in the same way that syms respond?
>
>
> To determine which genes were differentially expressed between the control and treatment for each data set (i.e. apo and sym), I used a GLM approach with contrasts to do pairwise comparisons between groups.  I have pasted my session output and preliminary code below.  Specifically, I could use advice on whether 1) my design matrix and 2) the way I have called my contrasts are appropriate for my question.
>
> Second, for genes that are differentially expressed in both the apo and sym data sets, is it possible to test if the genes are significantly differentially expressed from each other?  For instance, if a gene is 9-fold up regulated in apos, but only 3-fold up regulated in syms, are these significantly different such that I could say that this gene is significantly more up regulated in the apo state?  It seems that this could be some sort of interaction term, but I’m uncertain as to the best way to set this up.

This is definitely an interaction term. How you set it up is dependent 
on what parameterization makes the most sense to you. Following your 
method below, you want

glmLRT(fit, contrast = c(1,-1,-1,1))

or you could do

state <- factor(rep(1:2, each = 8))
treat <- factor(rep(1:2, each=4, times=2))

design <- model.matrix(~treat*state)

and then

glmLRT(fit, coef=4)

will test the interaction, and coefficients 2 and 3 test the main 
effects for treatment and state, as you have done below.

Best,

Jim


>
> I’m using the 3.0.8 version of EdgeR and the 2.15.2 version of R.  I have 4 biological reps per treatment/state with the nomenclature of AC for apo control, AT for apo treatment, SC for sym control, and ST for sym treatment.
>
> Thanks so much,
>
> Morgan
>
> Morgan Mouchka
> PhD Candidate
> E231 Corson Hall
> Dept. Ecology and Evolutionary Biology
> Cornell University
> mep74 at cornell.edu
> http://www.eeb.cornell.edu/mouchka
>
>> x<- read.delim("AC_AT_SC_ST_counts_cnidaln.txt", row.names="Contig")
>> head (x)
>                                                            AC3  AC4  AC5  AC7  AT1  AT2
> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792     72   39   55   31   36   55
> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409     3    2    3    4    0    2
> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519 1260  576  957  529  663  961
> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436    17    7    6    6    7    8
> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090      0    0    0    0    0    0
> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499  7774 3857 5588 3835 3633 4470
>                                                            AT4  AT5  SC1  SC2  SC3  SC5
> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792     22   63   20   55   32   49
> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409     0    5    4    4    1    2
> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519  516  903  407 1072  345  788
> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436     4    5    3    0    2    5
> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090      0    0    0    0    0    0
> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499  3668 5489 1092 2288 1381 2194
>                                                            ST1  ST2  ST3  ST5
> Locus_34979_Transcript_1/1_Confidence_1.000_Length_792     39   65   40   82
> Locus_125599_Transcript_2/4_Confidence_1.000_Length_409     7    0    1    3
> Locus_100348_Transcript_1/1_Confidence_0.000_Length_1519  958 1103  900 1488
> Locus_129112_Transcript_1/1_Confidence_0.000_Length_436     7   11    6    8
> Locus_5661_Transcript_2/8_Confidence_1.000_Length_1090      0    0    0    0
> Locus_79922_Transcript_1/1_Confidence_0.000_Length_6499  3629 4837 3383 4221
>> #assign samples to groups, A=AC, B=AT, C=SC, D=ST
>> group<- factor(c("A","A","A","A","B","B","B","B","C","C","C","C","D","D","D","D"))
>> # Define data object (list-based, includes info for counts and info about samples) and will calculate library size
>> y<- DGEList (counts = x, group = group)
> Calculating library sizes from column totals.
>> #filter out very lowly expressed genes, based on number of lowest number of reps (4 in this case)
>> keep<- rowSums (cpm(y)>1)>= 4
>> y2<- y[keep,]
>> dim(y2)
> [1] 17739    16
>> y2$samples$lib.size<- colSums(y2$counts)
>> # Normalize for sample-specific effects
>> y3<- calcNormFactors (y2)
>> y3$samples
>      group lib.size norm.factors
> AC3     A 11095093    1.0237872
> AC4     A  5604841    0.9912734
> AC5     A  7517691    0.9824405
> AC7     A  5918039    0.9919439
> AT1     B  6004830    0.9609672
> AT2     B  7692419    0.9543765
> AT4     B  5197815    0.9575609
> AT5     B  7759281    0.9643070
> SC1     C  3542875    1.0693587
> SC2     C  6720709    1.0488297
> SC3     C  3881774    1.0022528
> SC5     C  5109584    1.0429925
> ST1     D 10429666    0.9815467
> ST2     D  9662842    1.0103866
> ST3     D  8267838    1.0198095
> ST5     D 10698815    1.0069064
>> #plot biological coefficient of variation (BVC)between samples
>> plotMDS(y3)
>> #check levels
>> levels (y3$samples$group)
> [1] "A" "B" "C" "D"
>> #design matrix to describe treatment conditions
>> design<- model.matrix(~0+group, data=y3$samples)
>> colnames(design)<- c("A", "B", "C", "D")
>> design
>      A B C D
> AC3 1 0 0 0
> AC4 1 0 0 0
> AC5 1 0 0 0
> AC7 1 0 0 0
> AT1 0 1 0 0
> AT2 0 1 0 0
> AT4 0 1 0 0
> AT5 0 1 0 0
> SC1 0 0 1 0
> SC2 0 0 1 0
> SC3 0 0 1 0
> SC5 0 0 1 0
> ST1 0 0 0 1
> ST2 0 0 0 1
> ST3 0 0 0 1
> ST5 0 0 0 1
> attr(,"assign")
> [1] 1 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$group
> [1] "contr.treatment"
>
>> y4<- estimateGLMCommonDisp(y3, design, verbose=TRUE)
> Disp = 0.02061 , BCV = 0.1436
>> #estimate gene-wise dispersion
>> y5<- estimateGLMTrendedDisp(y4, design)
>> y6<- estimateGLMTagwiseDisp(y5,design)
>> plotBCV(y6)
>> fit<- glmFit(y6, design)
>> # make contrasts
>> my.contrasts<- makeContrasts(BvsA=B-A, DvsC=D-C, levels=design)
>> #pairwise comparison of B and A (apo treatmetn and apo control)
>> lrt.BvsA<- glmLRT(fit, contrast=my.contrasts[,"BvsA"])
>> # summary of model fitting funtions
>> summary(de<-decideTestsDGE(lrt.BvsA))
>     [,1]
> -1   562
> 0  16091
> 1   1086
>> lrt.DvsC<- glmLRT(fit, contrast=my.contrasts[,"DvsC"])
>> summary(de<-decideTestsDGE(lrt.DvsC))
>     [,1]
> -1   496
> 0  16710
> 1    533
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list