[BioC] replicates and low expression levels

Mon Jun 2 15:07:36 MEST 2003

Quite...
Quite!

Relating to the next point-- you can get quite a good appreciation of the
inherent uncertainty in expression values by using a resampling strategy.

If you replace the real PM values in an AffyBatch object with resampled MM
values (after probe level normalisation of the batch?)

Using something like
>ABObject2<-ABObject

>pm(ABObject2)<- sample(mm(ABObject), size=22283, replace=FALSE)

If you then use an expression diagnostic like

>false.eset<-rma(ABObject2)

You can create a distribution of noise created values-- a useful tool when
considering a filter cutoff.

-----Original Message-----
From: Crispin Miller [mailto:CMiller at PICR.man.ac.uk] 
Sent: Monday, June 02, 2003 1:07 PM
To: Stephen Henderson; Claire Wilson; Robert Gentleman
Cc: BioC mailing list
Subject: RE: [BioC] replicates and low expression levels

Hi,
In addition, filtering prior to normalisation would need a chip-specific
threshold to filter by (otherwise intensity levels between chips would be
directly comparable and we wouldn't need to normalise). Presumeably, this
would be done by computing global statistics and then, and then determining
the threshold relative to these... 

This sounds pretty much like normalisation? :-)

Crispin

> -----Original Message-----
> From: Stephen Henderson [mailto:s.henderson at ucl.ac.uk]
> Sent: 02 June 2003 11:28
> To: Claire Wilson; 'Robert Gentleman '
> Cc: 'BioC mailing list '
> Subject: RE: [BioC] replicates and low expression levels
> 
> 
> I think you have to normalise prior to filtering. The noise 
> should be a
> reliable component of the normalisation procedure.
> 
> the second point is interesting. How do you select a filter 
> for low variance
> low expression data. In the first instance if its not varying 
> then you might
> as well filter it as it is not interesting to your given 
> experiment in any
> case-- regardless of whether it is noise or low expression!
> 
> The question of what constitutes present and absent is more 
> difficult. I
> would like to see a better example  of spike-in data in the 
> literature that
> really focuses on low expression values (though genelogic and 
> affy sets are
> an excellent and appreciated resource for designing expression indices
> generally).
> 
> Stephen
> -----Original Message-----
> From: Claire Wilson
> To: Robert Gentleman
> Cc: BioC mailing list
> Sent: 02/06/03 11:17
> Subject: RE: [BioC] replicates and low expression levels
> 
> >On Fri, May 30, 2003 at 05:28:45PM +0100, Crispin Miller wrote:
> > > Hi,
> > > Just a quick question about low expression levels on Affy 
> systems -
> I 
> > hope it's not too off-topic; it is about normalisation and data
> analysis...
> > > I've heard a lot of people advocating that it's a good idea to
> perform 
> > an initial filtering on either Present Marginal or Absent 
> calls, or on
> 
> > gene-expression levels (so that only genes with an expression > 40,
> say, 
> > after scaling to a TGT of 100 using the MAS5.0 algorithm, 
> are part of
> the 
> > further analysis). Firstly, am I right in thinking that this is to 
> > eliminate data that are too close to the background noise 
> level of the
> system.
> > >
> > > I wanted to canvas opinion as to whether people feel we need to do
> this 
> > if we have replicates and are using statistical tests - rather than
> just 
> > fold-changes - to identify 'interesting' genes. Does the 
> statistical 
> > testing do this job for us?
> >
> >Hi,
> >   In my opinion you should always do some sort of non-specific
> >   filtering. What you have described is one form of it, 
> others include
> >   removing genes that show little or no variability across samples.
> >   I think of non-specific filtering as filtering without 
> reference to
> >   phenotype (of any sort).
> >
> >   There are a number of reasons for doing this, some 
> motivated by the
> >   biology and some by the statistics.
> >
> >   First off, especially for Affy, the chip is designed for 
> all tissue
> >   types but a commonly held belief is that only about 40% of the
> genome
> >   is expressed in any specific tissue type. So, for any 
> experiment you
> >   will have a pretty large number of probes for genes that are not
> >   expressed in the tissue you are looking at.
> >   From a statistical perspective you need to be a little 
> bit cautious
> >   if you are going to standardize genes across samples 
> (this is pretty
> >   common). If you do not remove those genes that show little
> >   variability before standardization then you have just elevated the
> >   noise to the same status as the signal (and if the 40% estimate is
> >   right then you actually have more noise than signal - not too
> >   pleasant).
> 
> Hi,
> 
> Just to clarify a couple of points. This suggest to me that 
> filtering of
> genes with low expression is required prior to normalization and I was
> just wondering in Bioconductor how this is achieved without the use of
> Present/Absent calls and following on from a later point
>  
> >   you have just carried out). It seems to me to be much 
> easier to just
> >   filter those genes with no expression or little variation 
> out at the
> >   very start.
> 
> what would be your filter for no expression of little variation?
> 
> Sorry if these questions are a little basic
> 
> Thanks
> 
> Claire
>  
> --------------------------------------------------------
> 
>  
> This email is confidential and intended solely for the use of 
> th... {{dropped}}
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>

--------------------------------------------------------

This email is confidential and intended solely for the use of th... {{dropped}}