[BioC] normalisation assumptions (violation of)

Mon Aug 7 18:10:32 CEST 2006

Hi.

On 8/7/06, J.delasHeras at ed.ac.uk <J.delasHeras at ed.ac.uk> wrote:
> Quoting Sean Davis <sdavis2 at mail.nih.gov>:
>
> >
> >
> >
> > On 8/7/06 6:59 AM, "J.delasHeras at ed.ac.uk" <J.delasHeras at ed.ac.uk> wrote:
> >
> >>
> >> Hi,
> >>
> >> I have a set of data from an experiment where there appears to be an
> >> effect of the treatment on a large number of genes. I put scatterplots
> >> for 6 of the slides here:
> >>
> >> http://mcnach.com/MISC/scatterplots.gif
> >>
> >> these are Cy3 vs Cy5, in log scale.
> >>
> >> These show that many genes are differentially expressed, and they are
> >> mostly one one side only (upregulated; some of those slides are dye
> >> swaps).
> >>
> >> Would this appear to violate (too much) any of the assumptions made by
> >> loess normalisation? Should I investigate other normalisation
> >> procedures?
> >
> > First, I would start by doing a VERY thorough evalutation of the slide
> > quality for these slides, as these are very distorted scatterplots.  IF the
> > slide quality looks OK, then I would probably stay away from a non-linear
> > normalization method, as these will tend to make your
> > differentially-expressed genes look less differentially-expressed.
> >
> > Sean
>
> Hi Sean,
>
> thanks for your reply. The slides are good, I checked them well. The
> strong effect is not so unexpected, as it involves transfection of
> cells with a DNA-binding protein fused to a strong transactivator, so
> in theory the fusion protein could be responsible of the expression of
> a very large number of genes. There is some specificity to the binding,
> but there should be many target sites, often at promoters... So the
> effects are more or less what we expected, I suppose, and the quality
> of the slides is good. The second spike going either almost vertical or
> almost horizontal should correspond to those genes that are not
> expressed on the particular cell line, but expressed after transfection.
>
> Do you have any suggestions of what sort of methods to use, for the
> normalisation of such experiments? Until now I used loess for
> everything, but I wasn't sure it would be okay for this experiment when
> I saw these plots.

Roughly what fraction of DEs do you except/see by visual inspection?
BTW, it is not clear if your plots in scatterplots.gif are on the
intensity or log scale, but looking at the noise structure I guess on
the log scale.

loess(), not lowess(), can be tuned to be very robust against outliers
including non-symmetric ones.  I know Gordon Smyth has done some
examples/slides on this, but I'm not sure if they're in limma or not.
In addition, in the aroma.light package you can assign weights to the
datapoints for some of the normalization methods.  Assigning a smaller
weight to a datapoint will make that datapoint have less of a say in
the estimation of the normalization function, but when it comes to
normalize/transform the datapoints, all are transformed equally much.
So with weights you may be able to tune your robustness against
outliers further.

/Henrik

> Jose
>
> --
> Dr. Jose I. de las Heras                      Email: J.delasHeras at ed.ac.uk
> The Wellcome Trust Centre for Cell Biology    Phone: +44 (0)131 6513374
> Institute for Cell & Molecular Biology        Fax:   +44 (0)131 6507360
> Swann Building, Mayfield Road
> University of Edinburgh
> Edinburgh EH9 3JR
> UK
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>