[BioC] design matrix edge R pairwise comparison at different time points after infection with replicates

Fri Jun 22 12:02:57 CEST 2012

Hi Kaat,

It is probably better to fit all your data with a single call to glmFit(), over all 18 samples; you can test the differences of interest trough the 'coef' or 'contrast' argument on glmLRT().  That would afford you more degrees of freedom and presumably better estimates of dispersion, and so on.

From your description, I can't quite figure out your design matrix. You have three factors: treatment, test and time point.  First, you need to input all 18 samples and extend your 'treatment' and 'test' factor variables to have 18 values (corresponding to the columns of your table).  And, then also include a time variable in your design.  Some decisions might need to be made about interactions to include.

Hope that gets you started.

Best,
Mark

----------
Prof. Dr. Mark Robinson
Bioinformatics
Institute of Molecular Life Sciences
University of Zurich
Winterthurerstrasse 190
8057 Zurich
Switzerland

v: +41 44 635 4848
f: +41 44 635 6898
e: mark.robinson at imls.uzh.ch
o: Y11-J-16
w: http://tiny.cc/mrobin

----------
http://www.fgcz.ch/Bioconductor2012

On 21.06.2012, at 11:42, Kaat De Cremer wrote:

> Dear all,
> 
> 
> I am using edgeR to find genes differentially expressed between infected and mock-infected control plants, at 3 time points after infection.
> I  have RNAseq data for 3 independent tests, so for every single test I have 6 libraries  (control + infected at 3 time points).
> Having three replicates this makes 18 libraries in total.
> 
> What I did until now is look at each time point separate and calculate DEgenes at that time point as shown in this script:
> 
>> head(x)
>    C1  C2  C3  T1  T2  T3
> 1   0   1   2   0   0   0
> 2  13   6   4  10   8  12
> 3  17  16   9  10   8  11
> 4   2   1   2   2   3   2
> 5. 1   3   1   2   1   3   0
> 6  958 457 438 565 429 518
> 
>> treatment<-factor(c("C","C","C","T","T","T"))
>> test<-factor(c(1,2,3,1,2,3))
>> y<-DGEList(counts=x,group=treatment)
> Calculating library sizes from column totals.
>> cpm.y<-cpm(y)
>> y<-y[rowSums(cpm.y>2)>=3,]
>> y<-calcNormFactors(y)
>> design<-model.matrix(~test+treat)
>> y<-estimateGLMCommonDisp(y,design,verbose=TRUE)
> Disp = 0.0265 , BCV = 0.1628
>> y<-estimateGLMTrendedDisp(y,design)
> Loading required package: splines
>> y<-estimateGLMTagwiseDisp(y,design)
>> fit<-glmFit(y,design)
>> lrt<-glmLRT(y,fit)
> 
> 
> This works fine but I wonder if I should do the analysis of the different time points all at once? Will this make a difference?
> Unfortunately I cannot figure out how to design the matrix.
> 
> I hope someone can help me,
> 
> Kaat
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor