[BioC] removal of genes with low expression values

Wolfgang Huber whuber at embl.de
Fri Jul 16 09:39:35 CEST 2010


Hernando

The article that Robert mentions below also demonstrates that for 
Affymetrix data, filtering by overall variance is preferable to 
filtering by average level when your goal is to detect differentially 
expressed genes.

In a nutshell, you want a filter criterion that is independent of your 
test statistic *under the null*, but correlated under the alternatives.

  Wolfgang

On Jul/8/10 7:32 PM, Robert Gentleman wrote:
> Hi,
>
>
> On Wed, Jul 7, 2010 at 6:22 AM, Naomi Altman<naomi at stat.psu.edu>  wrote:
>> Isn't filtering on spread like pretesting for differential expression?
>>   Maybe not such a good idea.
>
>    No it isn't like pretesting and it is quite often a good idea.
> Please have a look at:
>
> Independent filtering increases detection power for high-throughput experiments.
> Bourgon R, Gentleman R, Huber W.
> Proc Natl Acad Sci U S A. 2010 May 25;107(21):9546-51. Epub 2010 May 11.
>   Where the reasons why it is not like a pretest are given and the
> potential benefits are laid out.
>
>
>
>>
>> When using MAS5, it is traditional to use the Affy presence/absence score.
>>   (Not necessarily optimal ...)
>>
>> --Naomi
>>
>> At 06:53 AM 7/7/2010, Yuan Hao wrote:
>>>
>>> Usually it can be filtered based on IQR and/or variance across
>>> samples, which might be worthy of thinking besides 'average'
>>>
>>> Yuan
>>>
>>> On 7 Jul 2010, at 11:47, Sean Davis wrote:
>>>
>>>> On Wed, Jul 7, 2010 at 6:28 AM, Hernando Martínez<hernybiotec at gmail.com
>>>>> wrote:
>>>>
>>>>> Hi, I need to remove genes with low expression values from a
>>>>> expression
>>>>> matrix. I would like to remove those with an average of expression
>>>>> values
>>>>> less than a certain cut-off. I was thinking in computing the
>>>>> average for
>>>>> each row, create a list with the gene names for which their average
>>>>> is less
>>>>> than the cut-off, and remove those genes from the initial matrix.
>>>>> However,
>>>>> I
>>>>> have a couple of doubts that maybe you can help me with. Is there any
>>>>> package or function that makes this easier? And, does anyone know
>>>>> which
>>>>> cut-off to use for data normalized with RMA and for data normalized
>>>>> with
>>>>> MAS5? Thanks,
>>>>>
>>>> Take a look at the genefilter package.  However, what you describe
>>>> can be
>>>> done easily with standard R.
>>>>
>>>> I don't think there is such a thing as a "standard" cutoff for
>>>> microarray
>>>> data.
>>>>
>>>> Sean
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>> Naomi S. Altman                                814-865-3791 (voice)
>> Associate Professor
>> Dept. of Statistics                              814-863-7114 (fax)
>> Penn State University                         814-865-1348 (Statistics)
>> University Park, PA 16802-2111
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>>
>
>
>

-- 


Wolfgang Huber
EMBL
http://www.embl.de/research/units/genome_biology/huber



More information about the Bioconductor mailing list