[BioC] RMA + loess normalisation and filtering

Tue Apr 19 17:48:26 CEST 2005

Hi Katleen,

> question 1: I have performed *RMA normalisation *of my Affymetrix data. 
> However, for further analysis I think it is necessary to *filter* the 
> data (non-expressed genes or below background). However I don't know the 
> best way to filter the genes that are not expressed or very low 
> expressed (below the background), based on the RMA normalisation data.

My preference is to select genes based on their overall variability, 
using a criterions such as

    z = apply(exprs(x), 1, IQR)

(see als rowQ from Biobase-devel, or rowSds from the vsn package). The 
rationale is that it is difficult to decide on an absolute number that 
corresponds to "present" or "absent" (e.g. due to different AT-content), 
but if the values vary across the experiment there is some hope this is 
really detecting a transcript. I have no good suggestion on deciding a 
threshold though - I'd usually take the top 50% or alike, depending on 
chip type, and how the histogram of "z" looks.

> question 2: In a paper of Choe et al (2005, Genome Biology) I have read 
> that *loess normalisation *after the first normalisation step is 
> important in order to detect most true positive differentially expressed 
> genes. However when I perform
> />normdatabis<-normalize.exprSet.loess(RMAdata,transfn="antilog")/
> following warnings appear: /k-d tree limited by memory ncmax=5002/
> I guess that the loess normalization was only based on the 5002 first 
> probe set id's or what does this mean?
> Is it ok or do I need to follow another strategy for the second loess 
> normalisation step?

I don't think combining multiple normalization steps in this way is 
appropriate. RMA is a model-based normalization method and the results 
from it should be fine as is. It they aren't, then the model does not 
fit -- which means that either you have a data quality problem or you 
shouldn't use RMA in the first place.

Also, with so much normalization you are likely not just to remove 
technical variations but also biological signal, hence, to find *less* 
differentially expresse genes.

Best regards
   Wolfgang

-------------------------------------
Wolfgang Huber
European Bioinformatics Institute
European Molecular Biology Laboratory
Cambridge CB10 1SD
England
Phone: +44 1223 494642
Fax:   +44 1223 494486
Http:  www.ebi.ac.uk/huber