[BioC] Question on filtering in the Category package

James W. MacDonald jmacdon at med.umich.edu
Tue Aug 14 16:04:31 CEST 2007


Hi Boel,


Boel Brynedal wrote:
> Dear list,
> 
> I have a theoretical question regarding Filtering on Variation in the
> Category package. I've performed an analysis that closely resembles the
> Vignette, but I am still a bit uncertain about the filtering. 
> In the Vignette the following code is used:
> lowQ<-rowQ(eset,floor(0.25*NumArrays)) 
> upQ<-rowQ(eset,ceiling(0.75*NumArrays))
> iqrs<-upQ-lowQ
> select<-(upQ-lowQ)>0.5
> 
> My question is, why is this filtering necessary? I have performed my
> analysis without filtering, and the results where strange. 
> My guess is that this filtering is intended to eliminate the probe-sets
> that aren't expressed at all (and would cause category's containing them
> to be associated). But the reason for eliminating the probe-sets with
> the highest variability is less clear for me. Would these include probe-
> sets where something has gone wrong, or probe-sets that are not
> expressed at all in some, but not all, arrays? 
> What have I missed?

I think you misunderstand the filtering being done here. This doesn't 
remove probesets with variance greater than the 75th percentile. 
Instead, it selects probesets with an inter-quartile range greater than 0.5.

This is a non-parametric estimate of the variance for each probeset, and 
won't be adversely affected by outliers (unless you have lots of them, 
in which case they really aren't outliers ;-D).

This is a pretty reasonable way to filter probesets, as it protects 
against a single outlier making it look like there is a lot of 
variability in the expression values.

Best,

Jim



> 
> What kind of filtering are you using, and why?
> 
> Is there an article out there discussing the variability, and cause of
> the variability, on arrays? 
> 
> Any comments would be helpful.
> Thank you!
> 
> Best,
> Boel Brynedal
> 
> 
> --~*~**~***~*~***~**~*~--
> Boel Brynedal, MSc, PhD student
> Karolinska  Institutet
> Department of Clinical neuroscience
> 
> Karolinska University hospital Huddinge
> Division of Neurology, R54
> 141 86 Stockholm
> SWEDEN
> Phone: +46 8 585 819 27
> Fax:   +46 8 585 870 80
> E-mail: boel.brynedal at ki.se
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



More information about the Bioconductor mailing list