[BioC] Genefilter parameters for mouse 430 2

Richard Friedman friedman at cancercenter.columbia.edu
Wed Mar 19 20:10:07 CET 2008

Dear Bioconductor Users,

	I am using genefilter to filter an ExpressionSet of 4 Mouse 430 2 chips
preprocessed with gcrma  prior to  analysis with limma.

Here is a description of the expressionset.

 > xen2dataeset
ExpressionSet (storageMode: lockedEnvironment)
assayData: 45101 features, 4 samples
   element names: exprs
   sampleNames: A_xen_1_21.cel, A_xen_2_22.cel, D_nodal_1_27.cel,  
   varLabels and varMetadata description:
     sample: arbitrary numbering
   featureNames: 1415670_at, 1415671_at, ..., AFFX-r2-P1-cre-5_at   
(45101 total)
   fvarLabels and fvarMetadata description: none
experimentData: use 'experimentData(object)'
Annotation: mouse4302

Here is my session information.

 > sessionInfo()
R version 2.6.1 (2007-11-26)


attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
  [1] mouse4302probe_2.0.0 mouse4302cdf_2.0.0   mouse4302.db_2.0.2
  [4] limma_2.12.0         geneplotter_1.16.0   lattice_0.17-2
  [7] annotate_1.16.1      AnnotationDbi_1.0.6  RSQLite_0.6-3
[10] DBI_0.2-3            RColorBrewer_1.0-1   affyPLM_1.14.0
[13] xtable_1.5-2         simpleaffy_2.14.05   gcrma_2.10.0
[16] matchprobes_1.10.0   genefilter_1.16.0    survival_2.34
[19] annaffy_1.10.1       KEGG_2.0.1           GO_2.0.1
[22] affy_1.16.0          preprocessCore_1.0.0 affyio_1.6.1
[25] Biobase_1.16.3

loaded via a namespace (and not attached):
[1] KernSmooth_2.22-21 grid_2.6.1         tools_2.6.1

I have tried the filtering parameters in the article by Scholtens and  
Heydebreck on
p 233 of the book by Gentleman et al.:

 > f2<-function(x)(IQR(x)>0.5)
 > ff<-filterfun(f1,f2)
 > selected <-genefilter(xen2dataeset,ff)
 > sum(selected)
[1] 289

This seemed a bit small so that I tried the effect of each of the  
parameters individually:

  selectedp025A <-genefilter(xen2dataeset,f1)
 > sum(selectedp025A)
[1] 9681
 > selectedIQRgtp5 <-genefilter(xen2dataeset,f2)
 > sum(selectedIQRgtp5)
[1] 731

My questions;

1. Is the log2(100) intensity cutoff good for all chips?
	If not can someone recommend a good intensity cutoff for	mouse 4302.
2, Is the only effect of filtering to reduce the multiplier in the  
false discovery
        analysis OR does it reduce false positives in other ways by
	A. In the case of intensity filters by reducing the number of large  
fold changes resulting
	    from the ratios of small numbers.
	B. In the case of IQR filters eliminating large t-statistics  
resulting for genes with small variation	
	     across samples but fortuitously low standard deviations,

	Up until this time I have not filtered because the filtering  
parameters looked arbitrary and I
thought that it was cheating to reduce the # of tests used to compute  
the FDR. From reading and
further reflection I now believe otherwise. But whereas I now believe  
I should filter I am
not at all sure what parameters to use, and how much my final list of  
differentially expressed genes
will be sensitive to a choice of those parameters. In particular, i  
wonder if the
intensity filter cutoff should vary with chip-type and preprocessing  
method (eg GCRMA).

	Any thoughts and guidance would be appreciated.

Thanks as always,
Richard A. Friedman, PhD
Biomedical Informatics Shared Resource
Herbert Irving Comprehensive Cancer Center (HICCC)
Department of Biomedical Informatics (DBMI)
Educational Coordinator
Center for Computational Biology and Bioinformatics (C2B2)
National Center for Multiscale Analysis of Genomic Networks (MAGNet)
Box 95, Room 130BB or P&S 1-420C
Columbia University Medical Center
630 W. 168th St.
New York, NY 10032
(212)305-6901 (5-6901) (voice)
friedman at cancercenter.columbia.edu

"Sure I am willing to stop watching television
to get a better education."
-Rose Friedman, age 11

More information about the Bioconductor mailing list