[BioC] replicates and low expression levels

Mon Jun 2 12:27:37 MEST 2003

I think you have to normalise prior to filtering. The noise should be a
reliable component of the normalisation procedure.

the second point is interesting. How do you select a filter for low variance
low expression data. In the first instance if its not varying then you might
as well filter it as it is not interesting to your given experiment in any
case-- regardless of whether it is noise or low expression!

The question of what constitutes present and absent is more difficult. I
would like to see a better example  of spike-in data in the literature that
really focuses on low expression values (though genelogic and affy sets are
an excellent and appreciated resource for designing expression indices
generally).

Stephen
-----Original Message-----
From: Claire Wilson
To: Robert Gentleman
Cc: BioC mailing list
Sent: 02/06/03 11:17
Subject: RE: [BioC] replicates and low expression levels

>On Fri, May 30, 2003 at 05:28:45PM +0100, Crispin Miller wrote:
> > Hi,
> > Just a quick question about low expression levels on Affy systems -
I 
> hope it's not too off-topic; it is about normalisation and data
analysis...
> > I've heard a lot of people advocating that it's a good idea to
perform 
> an initial filtering on either Present Marginal or Absent calls, or on

> gene-expression levels (so that only genes with an expression > 40,
say, 
> after scaling to a TGT of 100 using the MAS5.0 algorithm, are part of
the 
> further analysis). Firstly, am I right in thinking that this is to 
> eliminate data that are too close to the background noise level of the
system.
> >
> > I wanted to canvas opinion as to whether people feel we need to do
this 
> if we have replicates and are using statistical tests - rather than
just 
> fold-changes - to identify 'interesting' genes. Does the statistical 
> testing do this job for us?
>
>Hi,
>   In my opinion you should always do some sort of non-specific
>   filtering. What you have described is one form of it, others include
>   removing genes that show little or no variability across samples.
>   I think of non-specific filtering as filtering without reference to
>   phenotype (of any sort).
>
>   There are a number of reasons for doing this, some motivated by the
>   biology and some by the statistics.
>
>   First off, especially for Affy, the chip is designed for all tissue
>   types but a commonly held belief is that only about 40% of the
genome
>   is expressed in any specific tissue type. So, for any experiment you
>   will have a pretty large number of probes for genes that are not
>   expressed in the tissue you are looking at.
>   From a statistical perspective you need to be a little bit cautious
>   if you are going to standardize genes across samples (this is pretty
>   common). If you do not remove those genes that show little
>   variability before standardization then you have just elevated the
>   noise to the same status as the signal (and if the 40% estimate is
>   right then you actually have more noise than signal - not too
>   pleasant).

Hi,

Just to clarify a couple of points. This suggest to me that filtering of
genes with low expression is required prior to normalization and I was
just wondering in Bioconductor how this is achieved without the use of
Present/Absent calls and following on from a later point

>   you have just carried out). It seems to me to be much easier to just
>   filter those genes with no expression or little variation out at the
>   very start.

what would be your filter for no expression of little variation?

Sorry if these questions are a little basic

Thanks

Claire

--------------------------------------------------------

This email is confidential and intended solely for the use of th... {{dropped}}