[BioC] edgeR : input and design matrix help

Mark Robinson mark.robinson at imls.uzh.ch
Fri May 18 10:32:03 CEST 2012


Hi KJ Lim,

> trees <-
> factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS","LS","LS","LS","LS"))

This one seems ok, although could be written more compactly and I'll give it a different name:

geno <- factor( rep(c("HS","LS"),each=8))

> treat <-
> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2","3h1","3h2","24h1","24h2","48h1","48h2"))

Maybe it's better to call this 'time' and you'll need to change this to something like:

time <- factor( rep(c("C","3h","24h","48h"), each=4) )

These two factor vectors match the columns of your count table, so you should be ok there.

If I understand your description, you are interested primarily in the differences in genotypes.  I would suggest starting with:
 
design <- model.matrix(~time+geno)

> design
   (Intercept) time3h time48h timeC genoLS
1            1      0       0     1      0
2            1      0       0     1      0
3            1      0       0     1      0
4            1      0       0     1      0
5            1      1       0     0      0
6            1      1       0     0      0
7            1      1       0     0      0
8            1      1       0     0      0
9            1      0       0     0      1
10           1      0       0     0      1
11           1      0       0     0      1
12           1      0       0     0      1
13           1      0       1     0      1
14           1      0       1     0      1
15           1      0       1     0      1
16           1      0       1     0      1
attr(,"assign")
[1] 0 1 1 1 2
attr(,"contrasts")
attr(,"contrasts")$time
[1] "contr.treatment"

attr(,"contrasts")$geno
[1] "contr.treatment"

Then, you can follow the usual steps as per the edgeR user's guide -- estimate dispersion, fit the glm with glmFit() and do the LR testing with glmLRT() and so on.

Hope that gets you started.

Best,
Mark




On 18.05.2012, at 05:15, KJ Lim wrote:

> Dear edgeR users,
> 
> I am not a statistician nor R programming geek, please forgive me if I ask
> stupid question.
> 
> I have RNA-seq data for 2 different genotype of trees with different time
> points 0hour(control),3hr,24hours,and 48hours. Each time point has two
> replicates.  The experiment design like following:
> 
>                         Sample harvested after treatment at
> Tree H1     Ctrl    3hrs  24hrs  48hrs
> Tree H2     Ctrl    3hrs  24hrs  48hrs
> 
> Tree L1     Ctrl    3hrs  24hrs  48hrs
> Tree L2     Ctrl    3hrs  24hrs  48hrs
> 
> I would like to study genes that are differentially expressed (DE)
> throughout the time points in these 2 genotype of trees with edgeR. I read
> from the edgeR user guide, the suitable DE analysis method for my expriment
> is GLM likelihood ratio test.
> 
> After read case study in the user guide, I have the RNA-Seq counts in a
> file as below in order to input into the edgeR package.
> 
> Ref Tags H1_C H2_C H1_3H H2_3H H1_1D H2_1D H1_2D H2_2D L1_C L2_C L1_3H
> L2_3H L1_1D L2_1D L1_2D L2_2D
> AA212259 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
> AB216460 1 0 0 1 2 1 6 2 1 1 1 0 5 1 3 1
> AC221873 3 0 1 0 3 0 0 2 1 0 1 0 2 1 2 0
> AD235900 3 1 6 0 5 4 4 4 7 2 4 3 3 0 8 0
> 
> I used the following commands to input the counts file into edgeR:
> 
> rawdata <- read.delim("file.txt", sep="\t", check.name=FALSE,
> stringsAsFactor=FALSE)
> trees <-
> factor(c("HS","HS","HS","HS","HS","HS","HS","HS","LS","LS","LS","LS","LS","LS","LS","LS"))
> treat <-
> factor(c("C1","C2","3h1","3h2","24h1","24h2","48h1","48h2","C1","C2","3h1","3h2","24h1","24h2","48h1","48h2"))
> 
> I'm not good in R programming, thus, having this file input into the edgeR
> and assign the factors as well as design-matrix is a challenge. I'm stuck
> how to tell the edgeR about my design matrix.
> 
> Could you guys kindly help me on this? Have I input my RNA-Seq counts
> correctly? Please correct me if there is any mistake I have done in my
> edgeR input.
> 
> I appreciate very much for your help and time. Have a nice day.
> 
> Best regards,
> KJ Lim
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

----------
Prof. Dr. Mark Robinson
Bioinformatics
Institute of Molecular Life Sciences
University of Zurich
Winterthurerstrasse 190
8057 Zurich
Switzerland

v: +41 44 635 4848
f: +41 44 635 6898
e: mark.robinson at imls.uzh.ch
o: Y11-J-16
w: http://tiny.cc/mrobin



More information about the Bioconductor mailing list