[BioC] Question on filtering in the Category package

Seth Falcon sfalcon at fhcrc.org
Tue Aug 14 15:46:15 CEST 2007


Hi Boel,

Boel Brynedal <boel.brynedal at ki.se> writes:
> I have a theoretical question regarding Filtering on Variation in the
> Category package. I've performed an analysis that closely resembles the
> Vignette, but I am still a bit uncertain about the filtering. 
> In the Vignette the following code is used:
> lowQ<-rowQ(eset,floor(0.25*NumArrays)) 
> upQ<-rowQ(eset,ceiling(0.75*NumArrays))
> iqrs<-upQ-lowQ
> select<-(upQ-lowQ)>0.5
>
> My question is, why is this filtering necessary? I have performed my
> analysis without filtering, and the results where strange. 

If you inflate your universe of possible genes with genes that
essentially cannot end up in your selected gene list, then you will
get strange results.

> My guess is that this filtering is intended to eliminate the probe-sets
> that aren't expressed at all (and would cause category's containing them
> to be associated). But the reason for eliminating the probe-sets with
> the highest variability is less clear for me. Would these include probe-
> sets where something has gone wrong, or probe-sets that are not
> expressed at all in some, but not all, arrays? 
> What have I missed?

probesets with _low_ variance across samples are eliminated.  The high
variance ones are kept.  Take a look at the GOstats vignette from a
recent version.  There is a new function nsFilter() that makes the
filtering easier to perform and the vignette and man page discuss some
details.  nsFitler is in the genefilter package.

You didn't tell us sessionInfo(), but I think you are not using the
current release of R and BioC packages.  It would be good to upgrade.

+ seth

-- 
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
BioC: http://bioconductor.org/
Blog: http://userprimary.net/user/



More information about the Bioconductor mailing list