[BioC] edgeR glm

Gordon K Smyth smyth at wehi.EDU.AU
Tue Nov 20 04:27:13 CET 2012


Dear Upendra,

Statistical theory warns strongly against testing for main effects in a 
linear model when interactions are present, because it is too difficult to 
interpret exactly what you are testing.  This what you are doing with:

   glmLRT(data.int,fit.int,coef=2:3)

Unless you are very familiar and confident with interaction model formula, 
you may be better off following the advice in Chapter 3 of the edgeR 
User's Guide:

http://bioconductor.org/packages/release/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf

Best wishes
Gordon

PS. Are you working with the current release of edgeR and Bioconductor?


------------ original message -----------
[BioC] edgeR glm
upendra kumar devisetty udevisetty at ucdavis.edu
Sun Nov 18 16:36:02 CET 2012

Hi Gordon,

For my experiment, I have three levels of factors for genotype ("B2, 
"Ein40","Ein9") and two levels of factors as treatment ("Sun", "Shade"). I 
did a nested interaction method to detect the differential gene 
expression.

Basically i am interested in the following:

1. Those genes that are differentially expressed among all three 
genotypes.

2. Those genes that are differentially expressed between wild-type (B2) 
and two mutants(Ein40 & Ein9) for Shade treatment.

Here is my code for the above:

library(edgeR)
library(reshape)

counts <- read.delim("sam2countsResults.tsv",row.names=NULL)
head(counts)
counts<-counts[counts$gene!="*",]
row.names(counts) <- counts$gene
counts <- counts[,-1]
head(counts)
counts[is.na(counts)] <- 0
names(counts)
summary(counts)
sort(colSums(counts))
counts <- counts[,colSums(counts) > 100000]
names(counts)
dim(counts)
samples <- data.frame(
   file=names(counts),
gt=unlist(strsplit(names(counts),"[_\\.]"))[c(1,7,13,19,25,31,37,43,49,55,61,67,73,79,85,91,97,103)],
trt=unlist(strsplit(names(counts),"[_\\.]"))[c(3,9,15,21,27,33,39,45,51,57,63,69,75,81,87,93,99,105)],
rep=unlist(strsplit(names(counts),"[_\\.]"))[c(4,10,16,22,28,34,40,46,52,58,64,70,76,82,88,94,100,106)]
)

data.int <- DGEList(counts,group=group)
data.int$samples
data.int <- data.int[rowSums(cpm(data.int) > 1) >= 3,]
dim(data.int)
data.int <- calcNormFactors(data.int)
design.int <- model.matrix(~gt*trt)
rownames(design.int) <- colnames(data.int)
design.int
data.int <- estimateGLMCommonDisp(data.int,design.int)
data.int <- estimateGLMTrendedDisp(data.int,design.int)
fit.int <- glmFit(data.int,design.int)
colnames(fit.int$design)

> colnames(fit.int$design)
[1] "(Intercept)"    "gtein40"        "gtein9"         "trtSun" 
"gtein40:trtSun"
[6] "gtein9:trtSun"

For my objective 1 i am doing the following:

int.lrt <- glmLRT(data.int,fit.int,coef=2:3)
topTags(int.lrt)

and for my objective 2 i am doing the following:
int.lrt <- glmLRT(data.int,fit.int,coef=5:6)
topTags(int.lrt)

Am i doing correct?

Thanks in advance for your help.

Upendra
-------------------------------------------------
Upendra Kumar Devisetty, Ph.D
Postdoctoral Scholar
Dept. of Plant Biology, UC Davis
Davis, CA 95616

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list