[BioC] How to specify factors for a RCBD in edgeR

Gordon K Smyth smyth at wehi.EDU.AU
Fri Feb 3 02:16:00 CET 2012


Dear Tilahun,

There is no way to specify a random block in edgeR, nor in any other 
existing statistical software designed to analyse RNA-Seq counts. 
However there is no purpose in doing so for your experiment.  The only 
reason that you might want to define block as random rather than fixed is 
to recover inter-block information about the experimental treatments. 
Your experiment is entirely balanced, in that all treatment conditions 
appear in each block, so there is no inter-block information to recover. 
Hence the edgeR analysis that I have suggested, which uses intra-block 
information only, is fully efficient.

If you want to test for a treatment effect in each individual tissue:

   design <- model.matrix(~Tissue+Tissue:Treatment+Block)

Then

   lrtA <- glmLRT(d, fit, coef="TissueA:Treatmentstressed")

finds genes DE between stress and control in tissue A,

   lrtB <- glmLRT(d, fit, coef="TissueB:Treatmentstressed")

finds genes DE between stress and control in tissue B.  And so on.

You will find it useful to examine colnames(design).

Best wishes
Gordon

> Date: Tue, 31 Jan 2012 11:24:49 -0600
> From: Tilahun Abebe <tilahun.abebe at uni.edu>
> To: bioconductor at r-project.org
> Subject: Re: [BioC] Bioconductor Digest, Vol 107, Issue 31
>
> Dear Gordon,
>
> Thank you for the tip. I think I am making some progress. I should have
> phrased my last question better. I meant what code do I use to get genes
> differentially expressed in one tissue under control and stress conditions.
>
> I have one more question. Is it possible to specify the random effect in
> the design matrix in edgeR? I am thinking of something similar to the
> RANDOM statement in SAS that will allow me to treat "block" as a random
> effect?
>
> Cheers.
>
> Tilahun
> ----------------------------
>
>> Date: Tue, 31 Jan 2012 11:18:59 +1100 (AUS Eastern Daylight Time)
>> From: Gordon K Smyth <smyth at wehi.EDU.AU>
>> To: Tilahun Abebe <tilahun.abebe at uni.edu>
>> Cc: Bioconductor mailing list <bioconductor at r-project.org>
>> Subject: [BioC] Please help! How to specify factors for a RCBD in
>>        edgeR
>>
>> Dear Tilahun,
>>
>> The first step is that you need to create a data frame containing the
>> experimental factors, just as you would for a SAS analysis.  So you need
>> to create three factors:
>>
>>   Tissue
>>   Treatment
>>   Block
>>
>> each containing 24 values, one for each RNA sample.  Then the design marix
>> is formed by:
>>
>>   design <- model.matrix(~Tissue*Treatment+Block)
>>
>> Type colnames(design) to see how the coefficients are defined.  You will
>> see that the interaction coefficients are coefficients 6 to 8.
>>
>> After fitting your linear model, you could find genes that show
>> significant Tissue*Treatment interaction (on 3df) by
>>
>>   lrt <- glmLRT(d, fit, coef=6:8)
>>
>> and so on.
>>
>> I don't understand your question "How do I obtain differentially expressed
>> genes in each Tissue*Stress combination?", so I can't give specific advice
>> on that.  Differentially expressed compared to what?
>>
>> Best wishes
>> Gordon
>>
>> ---------------- original message -------------------
>> [BioC] Please help! How to specify factors for a RCBD in edgeR
>> Tilahun Abebe tilahun.abebe at uni.edu
>> Wed Jan 25 22:16:41 CET 2012
>>
>> Hi,
>>
>> I am learning how to use edgeR to analyze RNA-seq data generated from
>> Illumina GAII. The experimental design is fairly complex.
>>
>> I have a mixed 4x2 factorial randomized complete block design (RCBD)
>> consisting of:
>>
>> 4 tissues: A, B, C, D
>> 2 treatments: control, stressed
>> 3 blocks: Block1, Block2, Block3
>>
>> Tissue and treatment are fixed effects and block is a random effect.
>>
>> Here is the code I tried to use in edgeR:
>>
>>> counts <- read.delim( file = "Mycounts.txt", header = TRUE)
>>> rownames <-counts [ , 1 ]
>>> d <- counts [, 2:25]    # counts are in columns 2-25
>>> d
>>> group <- c(rep("Control", 3), rep("Stress", 3), rep("Control", 3),
>> rep("Stress", 3), rep("Control", 3), rep("Stress", 3), rep("Control", 3),
>> rep("Stress", 3))
>>> d <- DGEList(counts = d, group = group)
>>> design <- model.matrix(~group)
>>> d <- calcNormFactors(d)
>>> d$samples
>>> d <- estimateGLMCommonDisp(d, design)
>>> d <- estimateGLMTagwiseDisp(d, design)
>>> d$common.dispersion
>>> fit <- glmFit(d, design)
>>> lrt <- glmLRT(d, fit, coef=2)
>>> topTags(lrt, n=4)
>>
>> I am interested to know genes differentially expressed in each of the four
>> tissues under stress. However, I feel like I am not specifying the factors
>> correctly in the design statement. My questions are:
>>
>> 1) How do I specify the fixed effects Tissue, Stress, and Tissue*Stress
>> interaction in the model?
>> 2) How do I tell edgeR to use block as a random effect?
>> 3) How do I obtain differentially expressed genes in each Tissue*Stress
>> combination?
>>
>> I appreciate your help.
>>
>> Cheers.
>>
>> Tilahun Abebe, Ph.D.
>> University of northern Iowa
>> Cedar Falls, IA

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list