[BioC] frma normalization and batch effects

Wolfgang Huber whuber at embl.de
Fri Aug 23 13:56:14 CEST 2013


Hi Judith

I am sure the frma people will have more specific recommendations, but in addition, both your questions below could be interpreted as questions of parameter choice in a (somewhat complex, since it includes the preprocessing and batch adjustment) classifier. An often useful way of making such choices is by cross-validation on a dataset that mimics the kind of data you expect to see in the future.

I guess you might also enjoy Jeff Leek's recent talk: http://www.birs.ca/events/2013/5-day-workshops/13w5083/videos/watch/201308151110-Leek.mp4 with frozen sva, and top scoring pairs
	
	Best wishes
	Wolfgang

On 23 Aug 2013, at 10:55, Judith Boer [guest] <guest at bioconductor.org> wrote:

> 
> Dear all,
> 
> I am working on expression classifiers for leukemic subtypes using Affymetrix Plus2 arrays. The training data consists of several batches. The developed classifier will be used to predict the subtype of new sets of samples as well as single samples. So far, I co-normalized new arrays with the training set, but this is not ideal.
> 
> I have read the frma paper by McCall et al, and it seems the perfect solutions. Before I start, I have a few conceptual questions:
> 
> 1. The training data consists of several batches of different sizes, some of them biased  towards a single subtype. Does normalization per batch using summarize=”random_effect” remove biology in this case? ComBat clearly did, and I ended up not correcting for batch effect, which worked fine for the classifiers I am using. Any suggestion which summarization would be best to use in this case?
> 
> 2. Is there a minimum of arrays to use with summarize=”random_effect”?
> 
> Any suggestions on how to best implement frma in this project are very welcome!
> 
> Cheers, Judith
> 
> 
> -- output of sessionInfo(): 
> 
> R version 2.15.2 (2012-10-26)
> Platform: i386-w64-mingw32/i386 (32-bit)
> 
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
> [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
> [5] LC_TIME=English_United Kingdom.1252    
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> 
> --
> Sent via the guest posting facility at bioconductor.org.
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list