[BioC] edgeR ANOVA testing

Gordon K Smyth smyth at wehi.EDU.AU
Tue Jun 25 04:30:32 CEST 2013


Dear Michael,

The analysis not correct at the moment because you have not declared 
Patient to be a factor in R.  Because the patient identifiers are numbers 
("4060" etc), R by default thinks that Patient is to be treated as a 
numerical covariate.  You need to use the factor() function to tell R that 
the entries for Patient are just identifiers, not numerical quantities.

Otherwise the analysis might be fine, but I can't tell for sure without 
seeing all 128 rows of the targets frame.  I think that you may be 
following Section 3.5 of the edgeR User's Guide.  Look carefully at how 
Patient is defined in that example.

There needs to be a column in the design matrix for each Patient, except 
for the first in each condition group.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
http://www.statsci.org/smyth

On Mon, 24 Jun 2013, Michael Breen wrote:

> Gordon,
>
> Any insight on the below?
>
> I am looking for a bit of enlightenment regarding edgeR. We are applying
> edgeR using ANOVA-like testing. We have 2 conditions (Cases and Controls)
> sampled at 2 treatments (untreated and treated)  with 64 biological
> replicates per condition (64 X 2 = 128 samples).
>
> We want to test 3 ways:
>
> 1.     DE between untreated and treated for cases.
>
> 2.     DE between untreated and treated for controls.
>
> 3.     DE between untreated and treated responding differently between
> Cases and Controls.
>
> I am curious if my design matrix is set up properly and if we are testing
> with the right coefficients and contrasts as seen below to address these
> three approaches.
>
> Any insight is helpful.
>
> Yours,
> Michael
>
>
>
> rawdata <- read.delim("Matrix.txt", check.names=FALSE,
> stringsAsFactors=FALSE)
> targets <- read.delim("Described.txt", check.names=FALSE,
> stringsAsFactors=FALSE)
> head(targets)
>
>  Condition Patient Treatments
>
> 1      Case    4060     untreated
>
> 2      Case    4060     treated
>
> 3      Case    4096     untreated
>
> 4      Case    4096     treated
>
> 5      Case    4099     untreated
>
> 6      Case    4099     treated
>
> y <- DGEList(counts=rawdata[,2:129], genes=rawdata[,1:1])
>
> keep <- rowSums(cpm(y)>1) >= 64
> y <- y[keep,]
> dim (y)
>
> y <- calcNormFactors(y)
> y$samples
>
> design <- model.matrix(~Condition+Condition:Patient+Condition: Treatments,
> data=targets)
> colnames(design)
>
> [1] "(Intercept)"              "ConditionControl"
>
> [3] "ConditionCase:Patient"    "ConditionControl:Patient"
>
> [5] "ConditionCase:Treatments "      "ConditionControl:Treatments "
>
>
>
> y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
> y <- estimateGLMTrendedDisp(y, design)
> y <- estimateGLMTagwiseDisp(y, design)
>
> fit <- glmFit(y, design)
> colnames(fit)
>
> # DE between untreated and treated for cases
> lrt <- glmLRT(fit, coef="ConditionCase: Treatments ")
>
> # DE between untreated and treated for cases
> lrt2 <- glmLRT(fit, coef="ConditionControl:Treatments ")
>
> # DE between untreated and treated responding differently between cases and
> controls
> lrt3 <- glmLRT(fit, contrast=c(0,0,0,0,1,-1)
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list