[BioC] Has anyone successfully used sva packages?

Thu Dec 15 15:29:35 CET 2011

Depending on the cell lines you are using, two cell lines could be very different.  

Can you do some test on each batch (without any batch effect removal), and see how many differentially expressed genes?

Also I assume you have similar proportion of two cell lines each every batch, i.e., fully random design.  No batch effects removal is perfect, you should minimize the effects by randomization.

Jianping

> ------------------------------
> 
> Message: 10
> Date: Thu, 15 Dec 2011 15:53:47 +0800
> From: Fabrice Tourre <fabrice.ciup at gmail.com>
> To: Bioconductor mailing list <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] Has anyone successfully used sva packages?
> Message-ID:
> 	<CAN31xkcf+F0qxNoF0HWqXLANvOtpXHHc=g03jBhWdM2vjqP8Nw at mail.gmail.com>
> Content-Type: text/plain; charset="ISO-8859-1"
> 
> Dear list,
> 
> Fellow the paper "Tackling the widespread and critical impact of batch
> effects in high-throughput data" and the code at
> http://rafalab.jhsph.edu/batch/, I used sva in my analysis. But I
> always get more than 90% of gene with p.adjust<0.05 (compared in two
> cell line). When I try limma package without sva, it give me about
> 3000 genes with p.adjust<0.05. It seems limma is reasonable. But it
> seems that limma can only remove tow batch each time using
> removeBatchEffect.
> 
> removeBatchEffect(x,batch,batch2=NULL,design=matrix(1,ncol(x),1))
> 
> Does anyone have suggestions?
> 
> library(preprocessCore)
> #cel=read.csv("celfies/cel_files.txt",header=F,skip=1)
> tab=read.table("output/ExpressionInput/groups.dis.txt",as.is=TRUE)
> mat<-read.table("gene/apt/rma-sketch.summary.txt",check.names =
> FALSE,header=T,row.names=1)
> colnames(mat)->c
> rownames(mat)->r
> merge(c,tab,by.y=1,by.x=1)->b
> tab<-b
> mat<-normalize.quantiles(as.matrix(mat))
> colnames(mat)<-c
> rownames(mat)<-r
> dates=vector("character",ncol(mat))
> for(i in seq(along=dates)){
> 	tmp=affyio::read.celfile.header(file.path("celfies",tab[i,1]),info="full")$ScanDate
> 	dates[i]=strsplit(tmp,"T")[[1]][1]
> }
> dates=as.Date(dates)
> batch=as.numeric(dates-min(dates))
> batch=as.numeric(cut(batch,c(unique(batch),1)))
> batch[is.na(batch)]=1
> 
> library(corpcor)
> library(affy)
> library("sva")
> library(limma)
> f<-factor(tab[,3])
> mod = model.matrix(~f)
> mod0=model.matrix(~1,data=tab)
> colnames(mod)<-levels(f)
> n.sv = num.sv(mat,mod,method="leek")
> svobj = sva(mat,mod,mod0,n.sv=n.sv)
> modSv = cbind(mod,svobj$sv)
> mod0Sv = cbind(mod0,svobj$sv)
> pValuesSv = f.pvalue(mat,modSv,mod0Sv)
> qValuesSv = p.adjust(pValuesSv,method="BH")
> 
> fit = lmFit(edata,modSv)
> contrast.matrix<- ???? How to construct this matrix?
> fitContrasts = contrasts.fit(fit,contrast.matrix)
> eb = eBayes(fitContrasts)
> topTableF(eb, adjust="BH")
> 
> 
> 
> 
> head(tab)
>                     x V2  V3
> 1 2A6_Hex_07Jul_09.CEL  1 dis
> 2 2A7_Hex_07Jul_09.CEL  1 dis
> 3 2A8_Hex_07Jul_09.CEL  1 dis
> 4 2A9_Hex_07Jul_09.CEL  1 norm
> 5 2AA_Hex_07Jul_09.CEL  1 norm
> 6 2AB_Hex_07Jul_09.CEL  1 norm
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> 
> 
> End of Bioconductor Digest, Vol 106, Issue 14
> *********************************************