[BioC] Removing probes before or after normalization

Naomi Altman naomi at stat.psu.edu
Sat Apr 22 05:51:20 CEST 2006

Speaking as a statistician and not from experience with this type of 
array, I would think that you would want to remove foreign probes 
before normalization.

If they hybridize to the sample, who knows what they are doing.  If 
they do not hybridize, they should have add a huge number of data 
points at the very low expression values.  These probes are all 
"negative controls" and do not have the same sources of variation as 
the real probes.  Hence, I think that they could adversely affect the 
normalization of low expression genes, which are often the most 
interesting genes in the data.

Again, this argument is not based on experience.  However, I have 
used arrays with about a hundred negative controls and these controls 
did have some surprisingly consistent patterns, showing they were not 
entirely negative.


At 02:59 PM 4/21/2006, Jenny Drnevich wrote:
>Hi Daniel,
>I have been wondering about this myself recently. I think all examples of
>filtering genes that I have seen do the filtering after the pre-processing
>steps, which is what I routinely do. I don't think I've seen a formal
>argument for this anywhere, but it seems that genes that are "Absent" (Affy
>calls) from all arrays and/or genes that have little variation across
>arrays (although I don't personally filter on this) are a part of those
>genes that do not change expression with treatment. Given that most
>normalization methods assume that most genes are not changing, you would
>not want to remove a portion of these genes before normalization, else you
>are increasing the proportion of genes that do change and perhaps
>decreasing the efficacy of the normalization? On the other hand, I have
>also worked with Affy's soybean chips, which have probe sets from two other
>species (pests, I believe) in addition to soybeans. In this case, we
>removed the non-soybean genes before pre-processing, mostly because we were
>running into memory problems. I hope we are not being arbitrary in removing
>non-species-of-interest genes before normalization and then filtering
>species-specific genes after normalization using different criteria! Any
>other thoughts?
>At 01:34 PM 4/21/2006, Bornman, Daniel M wrote:
> >Dear BioC,
> >
> >I have a cutom chip with multiple microbial organisms but I am currently
> >only interested in the results for one of these.  At what step in the
> >analysis process is it adviced to remove the other organisms from
> >analysis.  I worry that probes specific to those 'other' organisms may
> >contribute to the background noise.  In that case maybe I should remove
> >them prior to normalization and background correction.  Otherwise, maybe
> >prior to independent testing and p-value adjustment. And, if not there,
> >then prior to annotation.
> >
> >
> >Thank You,
> >
> >Daniel Bornman
> >Researcher
> >Battelle Memorial Institute
> >505 King Ave
> >Columbus, OH 43201
> >614.424.3229
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at stat.math.ethz.ch
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >Search the archives:
> >http://news.gmane.org/gmane.science.biology.informatics.conductor
>Jenny Drnevich, Ph.D.
>Functional Genomics Bioinformatics Specialist
>W.M. Keck Center for Comparative and Functional Genomics
>Roy J. Carver Biotechnology Center
>University of Illinois, Urbana-Champaign
>330 ERML
>1201 W. Gregory Dr.
>Urbana, IL 61801
>ph: 217-244-7355
>fax: 217-265-5066
>e-mail: drnevich at uiuc.edu
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>Search the archives: 

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111

More information about the Bioconductor mailing list