[BioC] how to design a model matrix and test by edgeR

Gordon K Smyth smyth at wehi.EDU.AU
Fri Jul 27 02:11:39 CEST 2012


> Date: Wed, 25 Jul 2012 22:07:35 -0400
> From: wang peter <wng.peter at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] how to design a model matrix and test by edgeR
>
> how to design a model matrix and test by edgeR
> dear all:
> my data is composed of 35 samples, have two factor
> time and treatment
>
> 1. The first question is
> i want to find DE genes (6h.treatment-0h.treatment)-(6h.control-0h.control)
> how to design the model? please tell me if the design and test is right
> such is my coding.

As you can see for yourself, coef=13 does not relate to 0h:

   colnames(design)[13]
   [1] "time48h:treatmenttreated"

On the other hand, coef=14 does:

   colnames(design)[14]
   [1] "time6h:treatmenttreated"

So you could use either

   glmLRT(dge, glmfit.dge, coef=14)

or

   glmLRT(dge, glmfit.dge, coef="time6h:treatmenttreated")

to find the DE genes for the interaction you want.

> 2. the second question is
> should i use samples involved 6h and 0h to do glmFitting
> or use all of 35 samples to do?
> i think i should use all

Yes, it is better to use all.

There are some strange aspects to your code.  You use common dispersion, 
although the edgeR User's Guide and our published papers make it clear 
that we recommend tagwise dispersions.  You actually estimate tagwise 
dispersions, but then don't use them.  Are you sure you need to sort genes 
by logFC?

Best wishes
Gordon

> raw.data <- read.table("expression-table.txt",row.names=1)
> lib_size <- read.table("lib_size.txt");
> lib_size <- unlist(lib_size)
> d <- DGEList(counts = raw.data, lib.size = lib_size)
> #normalization
> dge <- calcNormFactors(dge)
>
> treatment=factor(c(rep('control',6),rep('treated',24),rep('control',5)))
> time=factor(c('0h','0h','0h','24h','24h','24h','0h','0h','0h','6h','6h','6h','6h','12h','12h','12h','12h','18h','18h','18h','18h',
>             '24h','24h','24h','36h','36h','36h','48h','48h','48h','6h','12h','18h','36h','48h'))
> design <- model.matrix(~time*treatment)
>
> dge <- estimateGLMCommonDisp(dge, design)
> dge <- estimateGLMTagwiseDisp(dge, design)
> glmfit.dge <- glmFit(dge, design,dispersion=dge$common.dispersion)
> lrt.dge <- glmLRT(dge, glmfit.dge, coef=13)
> result <- topTags(lrt.dge, adjust.method="BH", sort.by="logFC")
>
> -- 
> shan gao
> Room 231(Dr.Fei lab)
> Boyce Thompson Institute
> Cornell University
> Tower Road, Ithaca, NY 14853-1801
> Office phone: 1-607-254-1267(day)
> Official email:sg839 at cornell.edu
> Facebook:http://www.facebook.com/profile.php?id=100001986532253


______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list