[BioC] RNA degradation tends & options for analysis

Thu Feb 23 17:25:29 CET 2006

One thing I find very useful is to look at

pairs(pm(myAffydata))

(This takes a while, since you are plotting lots of probes, and I 
usually put in lower.panel=NULL to get only the upper triangle of plots.)

If the arrays are comparable, then most of the data should cluster 
pretty tightly on the diagonal.

Incidentally, if some ambitious person would write a pairs routine 
for hexbin, that would be both faster and more informative.

--Naomi

At 09:16 AM 2/23/2006, James W. MacDonald wrote:
>Juanma Vaquerizas wrote:
> > Dear list,
> >
> > I'm trying to analyse some Affy arrays for my PhD thesis but I'm a
> > little bit stuck, so any comments on the following would be very
> > welcome.
> >
> > Basically I'm analysing a set of Affy arrays coming form 10 different
> > labs (3 biological replicates per lab) where each lab is using a
> > different RNA source. I've done some quality control using affyPLM
> > and the chips seem to be ok.
>
>Is this after processing them as one batch? If the residuals look OK,
>then this is a good indication that you can process them all together.
>
> >
> > If I have a look at the RNA digestion plot, 2 different trends are
> > clearly visible (half of the arrays follow one trend with a slope
> > around 1 and the other half with a slope around 3).
> >
> > I want to make some contrasts between the different RNA sources that
> > have been used, but as I've read in (Bolstad et al., 2005,
> > Bioinformatics and Computational Biology Solutions Using R and
> > Bioconductor, Springer) and in some previous messages in this list,
> > mixing arrays with very different slopes in the RNA digestion plots
> > is not a very good idea.
>
>In my experience, the RNA degradation plots are not nearly as important
>as the density plots. What do they look like? Are the distributions all
>pretty similar in shape and fairly close together?
>
> >
> > The options I'm thinking about at the moment are the following:
> >
> > Option 1:
> > 1.- Split the arrays by the lab of origin.
> > 2.- Preprocess them separately using GCRMA.
> > 3.- Combine the resulting esets into one eset.
> > 4.- Analyse using limma, modeling for 3 factors (RNA type, lab
> > effect, trend in the RNA digestion plot)
> > 5.- Extract the contrasts I am interested in (the RNA type ones)
> >
> > Option 2:
> > 1.- Split the arrays by the trend of the RNA digestion plot.
> > 2.- Preprocess them separately using GCRMA.
> > 3.- Combine the resulting esets into one eset.
> > 4.- Analyse using limma, modeling for 3 factors (RNA type, lab
> > effect, trend in the RNA digestion plot)
> > 5.- Extract the contrasts I am interested in (the RNA type ones)
> >
> > Option 3:
> > 1.- Do not split the arrays in groups.
> > 2.- Preprocess all of them using GCRMA.
> > 3.- Analyse using limma, modeling for 3 factors (RNA type, lab
> > effect, trend in the RNA digestion plot)
> > 4.- Extract the contrasts I am interested in (the RNA type ones)
>
>I would think this is the most reasonable method, if as you say the
>residuals from affyPLM all look good. One further check you can make is
>to do a PCA plot of the first two PCs and see how the replicated samples
>are grouping. If the replicates are all grouping together it may not
>even be necessary to model the lab effect. You could use plotPCA() in
>affycoretools to do this step.
>
> >
> >
> > Unfortunately I can't figure out which would be the best way to
> > proceed, or even if modeling for the trend is something that would be
> > acceptable. I've seen in the vignette of the affycoretools package
> > that the arrays coming from different RNA protocols are preprocessed
> > separately and then mixed for the linear model, although it is not
> > clear for me why is this option better that any of the others.
>
>Well, the example in affycoretools is a very special case and should not
>be construed as an example that one should use for 'normal' analyses
>(which makes me wonder if I need a different example).
>
>Anyway, in that vignette the samples have been processed completely
>differently (one set amplified with the NuGen Ovation kit, and one using
>the normal Affy IVT kit), so there is no way they should be processed as
>one batch. I then stick both sets of expression values into one exprSet
>simply to make the linear modeling step easier. Since I use a cell means
>model and never make any contrasts between the groups, this analysis is
>equivalent to keeping the data separate and fitting two separate models.
>
>HTH,
>
>Jim
>
>
> >
> > On the other hand, some messages to the list last week were for
> > preprocessing all the experiments at once...
> >
> > My understanding is that there is not a clear consensus about what to
> > do in those cases and I don't really know the consequences and the
> > differences between following the different approaches, so any
> > comments would be very much appreciated.
> >
> > Thank you very much for your help.
> >
> > Best wishes,
> >
> > Juanma.
> >
> >
> >
> > Juanma Vaquerizas
> > PhD Student
> > Regulation Group
> > EMBL-EBI
> > Wellcome Trust Genome Campus
> > Cambridge CB10 1SD
> > UK
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>
>
>--
>James W. MacDonald, M.S.
>Biostatistician
>Affymetrix and cDNA Microarray Core
>University of Michigan Cancer Center
>1500 E. Medical Center Drive
>7410 CCGC
>Ann Arbor MI 48109
>734-647-5623
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111