[BioC] genefiltering before or after the normalization?
sdavis2 at mail.nih.gov
Fri Jul 11 12:47:34 CEST 2008
On Fri, Jul 11, 2008 at 5:32 AM, Abhilash Venu <abhivenu at gmail.com> wrote:
> Dear Dr. Huber,
> Thank you for the advice. I have tried the script that you have advised to
> use. As you mentioned I have used the script after the normalization, but
> that has shown the following error, which I do not understand, whether I am
> using in the right way.
> MA<-normalizeBetweenArrays(log2(Rgene$G), method="quantile")# normalization
> rs = rowSds(MA)
> fx = fx[ rs > quantile(rs, 0.05), ]
> Error: object "fx" not found
Hi, Abhilash. I think that line should read:
fx = x[rs > quantile(rs,0.05),]
Wolfgang was simply suggesting subsetting x by the results of sd filtering.
> Can you advise me on the same.
> Thanks in advance.
> On Fri, Jul 11, 2008 at 4:06 AM, Wolfgang Huber <huber at ebi.ac.uk> wrote:
>> Hi Abhilash
>> I am working with single color data from Agilent platform. After the limma
>>> analysis the adjusted p values were higher than 5% of FDR. At this
>>> I am thinking of filtering the genes using genefilter. As my data set
>>> contains only raw intensities of normal and test before the normalization,
>>> where I am uisng 'normalizeBetweenArrays' command after log transforming
>>> In this scenario I am quite confused whether I should use the filter
>>> functions prior to normalization of after the normalization but efore
>>> fitting the linear model?
>>> As my data is not an expressionSet I cannot use the nonfilter commands, in
>>> this case any suggestions of using other filtering methods?
>>> Appreciate the suggestions
>> Such filtering is performed after normalisation, but it is essential that
>> the filter criterion does *not use the sample annotations*. E.g. you can use
>> for each gene the overall variance or IQR across the experiment.
>> If x is a matrix with rows=genes and columns=samples, then this can be as
>> simple as:
>> rs = rowSds(x)
>> fx = fx[ rs > quantile(rs, lambda), ]
>> where rowSds is in the genefilter package, and lambda is a parameter
>> between 0 and 1 that contains your belief in what fraction of probes on the
>> array correspond to target molecules that are never expressed in the
>> conditions you study.
>> Also note that after such filtering, strictly speaking, the nominal
>> p-values from the subsequent testing could be too small - but one can show
>> that in typical microarray applications the bias is negligible (compared to
>> the impact of other effects), and in any case the p-values can be used for
>> Best wishes
>> Wolfgang Huber, EMBL-EBI, http://www.ebi.ac.uk/huber
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor