[BioC] How to cope with arrays hybridized at significantly different time.

Michal Okoniewski michal at fgcz.ethz.ch
Fri Mar 13 09:09:58 CET 2009

Dear Triantafillos,

Your question sounds like a serious problem in a real (clinical) 
application of microarrays.
To tell the truth, not many people have such big datasets, many are not 
aware about sources
of variability, especially  at the stage  of  RNA extraction, because 
Affy hybridization itself
most often do not add more variability than the extraction conditions 
(patien's stress, sample
degradation, habits and moods of the person who gathers the matherial 
and extracts RNA).
Anyway - there are some "rules of good practice" that could be applied, eg

* keep precise and detailed annotation of samples - then you can try 
with anova to
estimate the strength of influencing factors
* try to extract RNA in the same/similar conditions - if it is not 
possible, randomize extractions
* use in the experiment as many replicates as you can afford :) 
* do not pool unless you have really good reason  for it
* define your goal and adjust the subset of your data and types of 
analysis to it - eg if you need just an "expression signature"
of 10-100 probesets, apply different methods and check how they overlap 
to avoid false positives,
if you need an answer to a "biological question" - use eg limma anova 
with contrasts and play with pathways...

The list is by far not complete, but I think it would be interesting to 
discuss good practices in the
applications of big microarray dataset - because this is the case where 
the science becomes
really directly applicable and useful...

all the best,

Triantafillos Paparountas wrote:
> Dear list,
> I would like to have your opinions on the following subject.
> In hospital-studies most of the time we get more than 200 arrays per
> study.It is evident that the arrays have significant differences among them
> due to different array batch and many other conditions ie technical
> competence, hybridization difference due to time span , circadian rhythm ,
> fresh sample or not->different time from RNA extraction to hybridization ,
> and others. How can we cope with the many uncontrollable factors and be able
> to use 80 , 200 or even a higher number of arrays at the same analysis
> fixing for any of the uncontrollable effects.
> I am using mostly Affymetrix arrays , Hu133plus2 , MOE Gene 1 St , Moe 430 2
> , and currently my favorite software apart from Bioconductor are Partek's
> Gene Suite (which -at least according to the manual- can fix for
> uncontrolled effects) , and Genespring due to the magnificent cluster
> algorithm that incorporates.
> Thanks in advance.
> T. Paparountas
> www.bioinformatics.gr
> 	[[alternative HTML version deleted]]
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

More information about the Bioconductor mailing list