[BioC] Surrogate Variable Analysis

Wed Mar 19 21:22:42 CET 2014

Hi Jerry,

As Jeff mentioned, "Batch information is often annotated in a data set." 

You mentioned 5 batches, so it seems you know which batch each sample is from. In this case, the function 'removeBatchEffect' in limma package may be helpful. It is not intended to use with linear modelling. For linear modelling, it is better to include the batch factors in the linear model, for example in the following way when your level of batches is large (in your case it's 5, that is >3 ).

dupcor <- duplicateCorrelation(data,design,block=batch )
dupcor$consensus.correlation
fit <- lmFit( data,design, block=batch ,  correlation=dupcor$consensus) 

Hope this help.

Di

----
Di Wu
Postdoctoral fellow
Harvard University, Statistics Department
Harvard Medical School
Science Center, 1 Oxford Street, Cambridge, MA 02138-2901 USA

________________________________________
From: bioconductor-bounces at r-project.org [bioconductor-bounces at r-project.org] on behalf of Jeff Leek [jtleek at gmail.com]
Sent: Wednesday, March 19, 2014 2:13 PM
To: Jerry Cholo
Cc: bioconductor at r-project.org
Subject: Re: [BioC] Surrogate Variable Analysis

Hi Jerry,

Batch information is often annotated in a data set. If it is not, one way
to annotate batches is to identify what time each sample was run and then
see if they cluster into distinct groups - which you could call batches.
Finally, the surrogate variable analysis approach with the sva() function
takes as input the data matrix (normalized) and the corresponding
information about the primary variables you care about and attempts to
recover the batches from the microarray data themselves.

I hope that helps.

Jeff

On Mon, Mar 17, 2014 at 9:00 PM, Jerry Cholo <jerrycholo at gmail.com> wrote:

> Hello,
>
> I would like to remove the batch effects from a gene expression data using
> Surrogate Variable Analysis (SVA).  When I looked at the SVA (
> http://www.bioconductor.org/packages/release/bioc/html/sva.html) and
> "bladderbatch", I noticed that for 57 different samples, there are 5
> different batches.  May someone let me know how I could define these
> batches for my own data?  In fact, my datasets include the normal, disease,
> two different tissues, and two different chip arrays?
>
> Thanks,
>
> Jerry
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor