[BioC] genefiltering before or after the normalization?

Sean Davis sdavis2 at mail.nih.gov
Sat Jul 12 19:02:16 CEST 2008


On Sat, Jul 12, 2008 at 11:26 AM, Abhilash Venu <abhivenu at gmail.com> wrote:
> Hi Sean,
>
> Yes, thank you.
>
> Yet my problem of the data did not get sorted out. I have tried different
> filtering methods including gapfilter and a combination of IQR with pOverA
> or cv etc. But my adj p values are above the FDR limit of 0.05 after the
> limma analysis. Also B values are generally -3.  As Gorden has mentioned in
> one of the previous mails, this is a indication of little evidance for
> differential expression.
>
> What could be the reason for this. Is this really an indicative of absence
> of differential expression?

It sounds like it.  Though people think of filtering as a way to
reduce the number of genes and improve the strength of signal after
multiple-testing correction, I don't think that is the correct
mindset.  Filtering is useful to remove probes from analysis that are
not measuring anything interesting (no change across experiments) or
are not well-measured.  So, the thought process should not be to do
hypothesis testing and then, if negative, to do filtering to try to
improve the situation, but to do filtering based on rational
thresholds for removing uninteresting or less-than-credible values as
part of a series of preprocessing steps.

Sean

> On Fri, Jul 11, 2008 at 4:17 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
>
>> On Fri, Jul 11, 2008 at 5:32 AM, Abhilash Venu <abhivenu at gmail.com> wrote:
>> > Dear Dr. Huber,
>> >
>> > Thank you for the advice. I have tried the script that you have advised
>> to
>> > use. As you mentioned I have used the script after the normalization, but
>> > that has shown the following error, which I do not understand, whether I
>> am
>> > using in the right way.
>> >
>> > MA<-normalizeBetweenArrays(log2(Rgene$G), method="quantile")#
>> normalization
>> >  rs = rowSds(MA)
>> >  fx = fx[ rs > quantile(rs, 0.05), ]
>> > Error: object "fx" not found
>>
>> Hi, Abhilash.  I think that line should read:
>>
>> fx = x[rs > quantile(rs,0.05),]
>>
>> Wolfgang was simply suggesting subsetting x by the results of sd filtering.
>>
>> Sean
>>
>> > Can you advise me on the same.
>> > Thanks in advance.
>> >
>> > Abhilash
>> >
>> > On Fri, Jul 11, 2008 at 4:06 AM, Wolfgang Huber <huber at ebi.ac.uk> wrote:
>> >
>> >> Hi Abhilash
>> >>
>> >>
>> >>  I am working with single color data from Agilent platform. After the
>> limma
>> >>> analysis the adjusted p values were higher than 5% of FDR. At this
>> >>> instance
>> >>> I am thinking of filtering the genes using genefilter. As my data set
>> >>> contains only raw intensities of normal and test before the
>> normalization,
>> >>> where I am uisng 'normalizeBetweenArrays' command after log
>> transforming
>> >>> the
>> >>> data.
>> >>> In this scenario I am quite confused whether I should use the filter
>> >>> functions prior to normalization of after the normalization but efore
>> >>> fitting the linear model?
>> >>> As my data is not an expressionSet I cannot use the nonfilter commands,
>> in
>> >>> this case any suggestions of using other filtering methods?
>> >>>
>> >>> Appreciate the suggestions
>> >>>
>> >>>
>> >> Such filtering is performed after normalisation, but it is essential
>> that
>> >> the filter criterion does *not use the sample annotations*. E.g. you can
>> use
>> >> for each gene the overall variance or IQR across the experiment.
>> >>
>> >> If x is a matrix with rows=genes and columns=samples, then this can be
>> as
>> >> simple as:
>> >>
>> >>  rs = rowSds(x)
>> >>  fx = fx[ rs > quantile(rs, lambda), ]
>> >>
>> >> where rowSds is in the genefilter package, and lambda is a parameter
>> >> between 0 and 1 that contains your belief in what fraction of probes on
>> the
>> >> array correspond to target molecules that are never expressed in the
>> >> conditions you study.
>> >>
>> >> Also note that after such filtering, strictly speaking, the nominal
>> >> p-values from the subsequent testing could be too small - but one can
>> show
>> >> that in typical microarray applications the bias is negligible (compared
>> to
>> >> the impact of other effects), and in any case the p-values can be used
>> for
>> >> ranking.
>> >>
>> >>  Best wishes
>> >>        Wolfgang
>> >>
>> >>
>> >> --
>> >> ----------------------------------------------------
>> >> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
>> >>
>> >
>> >
>> >
>> > --
>> >
>> > Regards,
>> > Abhilash
>> >
>> >        [[alternative HTML version deleted]]
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>> > Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>> >
>>
>
>
>
> --
>
> Regards,
> Abhilash
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list