[BioC] genefiltering before or after the normalization?

Sean Davis sdavis2 at mail.nih.gov
Fri Jul 11 12:47:34 CEST 2008


On Fri, Jul 11, 2008 at 5:32 AM, Abhilash Venu <abhivenu at gmail.com> wrote:
> Dear Dr. Huber,
>
> Thank you for the advice. I have tried the script that you have advised to
> use. As you mentioned I have used the script after the normalization, but
> that has shown the following error, which I do not understand, whether I am
> using in the right way.
>
> MA<-normalizeBetweenArrays(log2(Rgene$G), method="quantile")# normalization
>  rs = rowSds(MA)
>  fx = fx[ rs > quantile(rs, 0.05), ]
> Error: object "fx" not found

Hi, Abhilash.  I think that line should read:

fx = x[rs > quantile(rs,0.05),]

Wolfgang was simply suggesting subsetting x by the results of sd filtering.

Sean

> Can you advise me on the same.
> Thanks in advance.
>
> Abhilash
>
> On Fri, Jul 11, 2008 at 4:06 AM, Wolfgang Huber <huber at ebi.ac.uk> wrote:
>
>> Hi Abhilash
>>
>>
>>  I am working with single color data from Agilent platform. After the limma
>>> analysis the adjusted p values were higher than 5% of FDR. At this
>>> instance
>>> I am thinking of filtering the genes using genefilter. As my data set
>>> contains only raw intensities of normal and test before the normalization,
>>> where I am uisng 'normalizeBetweenArrays' command after log transforming
>>> the
>>> data.
>>> In this scenario I am quite confused whether I should use the filter
>>> functions prior to normalization of after the normalization but efore
>>> fitting the linear model?
>>> As my data is not an expressionSet I cannot use the nonfilter commands, in
>>> this case any suggestions of using other filtering methods?
>>>
>>> Appreciate the suggestions
>>>
>>>
>> Such filtering is performed after normalisation, but it is essential that
>> the filter criterion does *not use the sample annotations*. E.g. you can use
>> for each gene the overall variance or IQR across the experiment.
>>
>> If x is a matrix with rows=genes and columns=samples, then this can be as
>> simple as:
>>
>>  rs = rowSds(x)
>>  fx = fx[ rs > quantile(rs, lambda), ]
>>
>> where rowSds is in the genefilter package, and lambda is a parameter
>> between 0 and 1 that contains your belief in what fraction of probes on the
>> array correspond to target molecules that are never expressed in the
>> conditions you study.
>>
>> Also note that after such filtering, strictly speaking, the nominal
>> p-values from the subsequent testing could be too small - but one can show
>> that in typical microarray applications the bias is negligible (compared to
>> the impact of other effects), and in any case the p-values can be used for
>> ranking.
>>
>>  Best wishes
>>        Wolfgang
>>
>>
>> --
>> ----------------------------------------------------
>> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
>>
>
>
>
> --
>
> Regards,
> Abhilash
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list