[BioC] RNA degradation tends & options for analysis

Fri Feb 24 11:12:16 CET 2006

Thank you all very much for all the answers and comments. I really  
appreciate them.

The residuals were OK when processing all the arrays as one batch and  
the MAplots seem to be OK as well. The density plots were also ok,  
more or less same shape and all close together. So I guess I can  
process them together.

My concerns were coming form the fact that as I'm going to use GCRMA  
and in the summary step all the chips are used for modeling the  
expression value of the probeset, mixing things with different trends  
would make the process less accurate (or more biased) than if all  
chips would have had the same slope, although I must say I'm not sure  
wether this is going to happen or not.

Thanks very much again for all the answers.

Best wishes,

Juanma.

On 23 Feb 2006, at 16:25, Naomi Altman wrote:

> One thing I find very useful is to look at
>
> pairs(pm(myAffydata))
>
> (This takes a while, since you are plotting lots of probes, and I  
> usually put in lower.panel=NULL to get only the upper triangle of  
> plots.)
>
> If the arrays are comparable, then most of the data should cluster  
> pretty tightly on the diagonal.
>
> Incidentally, if some ambitious person would write a pairs routine  
> for hexbin, that would be both faster and more informative.
>
> --Naomi
>
>
> At 09:16 AM 2/23/2006, James W. MacDonald wrote:
>> Juanma Vaquerizas wrote:
>> > Dear list,
>> >
>> > I'm trying to analyse some Affy arrays for my PhD thesis but I'm a
>> > little bit stuck, so any comments on the following would be very
>> > welcome.
>> >
>> > Basically I'm analysing a set of Affy arrays coming form 10  
>> different
>> > labs (3 biological replicates per lab) where each lab is using a
>> > different RNA source. I've done some quality control using affyPLM
>> > and the chips seem to be ok.
>>
>> Is this after processing them as one batch? If the residuals look OK,
>> then this is a good indication that you can process them all  
>> together.
>>
>> >
>> > If I have a look at the RNA digestion plot, 2 different trends are
>> > clearly visible (half of the arrays follow one trend with a slope
>> > around 1 and the other half with a slope around 3).
>> >
>> > I want to make some contrasts between the different RNA sources  
>> that
>> > have been used, but as I've read in (Bolstad et al., 2005,
>> > Bioinformatics and Computational Biology Solutions Using R and
>> > Bioconductor, Springer) and in some previous messages in this list,
>> > mixing arrays with very different slopes in the RNA digestion plots
>> > is not a very good idea.
>>
>> In my experience, the RNA degradation plots are not nearly as  
>> important
>> as the density plots. What do they look like? Are the  
>> distributions all
>> pretty similar in shape and fairly close together?
>>
>> >
>> > The options I'm thinking about at the moment are the following:
>> >
>> > Option 1:
>> > 1.- Split the arrays by the lab of origin.
>> > 2.- Preprocess them separately using GCRMA.
>> > 3.- Combine the resulting esets into one eset.
>> > 4.- Analyse using limma, modeling for 3 factors (RNA type, lab
>> > effect, trend in the RNA digestion plot)
>> > 5.- Extract the contrasts I am interested in (the RNA type ones)
>> >
>> > Option 2:
>> > 1.- Split the arrays by the trend of the RNA digestion plot.
>> > 2.- Preprocess them separately using GCRMA.
>> > 3.- Combine the resulting esets into one eset.
>> > 4.- Analyse using limma, modeling for 3 factors (RNA type, lab
>> > effect, trend in the RNA digestion plot)
>> > 5.- Extract the contrasts I am interested in (the RNA type ones)
>> >
>> > Option 3:
>> > 1.- Do not split the arrays in groups.
>> > 2.- Preprocess all of them using GCRMA.
>> > 3.- Analyse using limma, modeling for 3 factors (RNA type, lab
>> > effect, trend in the RNA digestion plot)
>> > 4.- Extract the contrasts I am interested in (the RNA type ones)
>>
>> I would think this is the most reasonable method, if as you say the
>> residuals from affyPLM all look good. One further check you can  
>> make is
>> to do a PCA plot of the first two PCs and see how the replicated  
>> samples
>> are grouping. If the replicates are all grouping together it may not
>> even be necessary to model the lab effect. You could use plotPCA() in
>> affycoretools to do this step.
>>
>> >
>> >
>> > Unfortunately I can't figure out which would be the best way to
>> > proceed, or even if modeling for the trend is something that  
>> would be
>> > acceptable. I've seen in the vignette of the affycoretools package
>> > that the arrays coming from different RNA protocols are  
>> preprocessed
>> > separately and then mixed for the linear model, although it is not
>> > clear for me why is this option better that any of the others.
>>
>> Well, the example in affycoretools is a very special case and  
>> should not
>> be construed as an example that one should use for 'normal' analyses
>> (which makes me wonder if I need a different example).
>>
>> Anyway, in that vignette the samples have been processed completely
>> differently (one set amplified with the NuGen Ovation kit, and one  
>> using
>> the normal Affy IVT kit), so there is no way they should be  
>> processed as
>> one batch. I then stick both sets of expression values into one  
>> exprSet
>> simply to make the linear modeling step easier. Since I use a cell  
>> means
>> model and never make any contrasts between the groups, this  
>> analysis is
>> equivalent to keeping the data separate and fitting two separate  
>> models.
>>
>> HTH,
>>
>> Jim
>>
>>
>> >
>> > On the other hand, some messages to the list last week were for
>> > preprocessing all the experiments at once...
>> >
>> > My understanding is that there is not a clear consensus about  
>> what to
>> > do in those cases and I don't really know the consequences and the
>> > differences between following the different approaches, so any
>> > comments would be very much appreciated.
>> >
>> > Thank you very much for your help.
>> >
>> > Best wishes,
>> >
>> > Juanma.
>> >
>> >
>> >
>> > Juanma Vaquerizas
>> > PhD Student
>> > Regulation Group
>> > EMBL-EBI
>> > Wellcome Trust Genome Campus
>> > Cambridge CB10 1SD
>> > UK
>> >
>> > _______________________________________________
>> > Bioconductor mailing list
>> > Bioconductor at stat.math.ethz.ch
>> > https://stat.ethz.ch/mailman/listinfo/bioconductor
>>
>>
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> Affymetrix and cDNA Microarray Core
>> University of Michigan Cancer Center
>> 1500 E. Medical Center Drive
>> 7410 CCGC
>> Ann Arbor MI 48109
>> 734-647-5623
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>
> Naomi S. Altman                                814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics                              814-863-7114 (fax)
> Penn State University                         814-865-1348  
> (Statistics)
> University Park, PA 16802-2111
>

Juanma Vaquerizas
PhD Student
Regulation Group
EMBL-EBI
Wellcome Trust Genome Campus
Cambridge CB10 1SD
UK