[BioC] Biological replication (was RNA degradation problem)
naomi at stat.psu.edu
Fri Jan 20 01:41:30 CET 2006
The question of what is appropriate biological replication is a tough
one. The objective is to obtain results that are valid in the
population of interest, which usually is not plants grown in a single
batch in the green house. But how much variability should we
induce? Each batch of plants grown separately but in the same
building (different growth chambers), grown in different
labs? different universities?
In my very first Affy experiment, the investigator did the
following: 2 batches of plants grown separately, 2 samples of plants
from 1 of the batches, 2 microarrays from one of the samples, for 4
arrays in all.
The correlation among the results was 2 arrays from same sample > 2
samples from same batch > 2 batches. This should be no surprise,
even though we did not have enough replication to do any formal testing.
I think at minimum that you want to achieve results that would be
replicable within your own lab. That would suggest batches of plants
grown separately from separate batches of seed.
The best plan is a randomized complete block design, with every
condition sampled in every block. If the conditions are tissues,
this is readily achieved.
Personally, I look at the density plots of the probes on the
arrays. If they have the same "shape" (which is usually a unimodal
distribution with long tail to the right on the log2 scale) then I
cross my fingers (that is supposed to bring good luck) and use
RMA. Most of the experiments I have been involved with using
arabidopsis arrays have involved tissue differences, and the amount
of differential expression has been huge on the probeset scale (over
60% of genes), but these probe densities have been pretty similar.
At 05:02 PM 1/19/2006, Matthew Hannah wrote:
>From: fhong at salk.edu [mailto:fhong at salk.edu]
>Sent: Thu 19/01/2006 21:27
>To: Matthew Hannah
>Cc: bioconductor at stat.math.ethz.ch
>Subject: Re: [BioC] RNA degradation problem
>Thank you very much for your help.
> > >It's amazing how many
> >> lab plant biologists see pooled samples from a bulk of plants grown at
> >> the same time as biological replicates when they are clearly not.
> >I would think that all plants under experiment shoudl be grown at the same
> >time without different conditions/treatments. Biological replicates should
> >be tissue samples from differnt groupd of plants, say sample from 50
> >plants as replicate1 and sample from another 50 as replicate 2.
> >Do you think that biological replicates should be grown at different time?
>Absolutely! Biological replication must be either single plants
>grown in the same experiment (but noone wants to risk single plants
>for arrays) or large pools of plants from INDEPENDENT experiments
>(or the pools must be smaller than sample size - doesn't really
>happen for arrays) otherwise what biological variability are you
>sampling? Say you have 150 plants growing in the greenhouse and you
>harvest 3 random pools of 50 as your 3 'biological replicates' then
>you will have eliminated all variability from them and the arrays
>will be as good as technical replicates and any statistical testing is invalid.
> >> I find hist, RNA deg, AffyPLM and a simple RMA norm followed by
> >> plot(as.data.frame(exprs(eset.rma))) can answer in most cases for why it
> >> didn't work, or won't work - in the rare case when someone asks for QC
> > >before rather than after they realise the data is strange ;-)
> >This actually pull out another question: when % of differential genes is
> >large, which normalization better works better?
>I've posted on this alot about 1.5 years ago, you should find it in
>the archives - but simply noone knows or has tested it
> >Take a look at the last plot, which clearly indicate homogeneous within
> >replicates and heterogeneous among samples.
> >(1) Will stem top and stem base differ so much? Or it is the preparation
> >process bring in extra correlaton within replicates.
> >(2) when % of differential genes is large, which normalization better
> >works better?
>Looking at these scatterplots, I can honestly say I've never seen so
>much DE. I would be suprised if samples such as different stem
>positions were so different. Something must be wrong with the
>samples or sampling in my opinion. The scatterplots are slightly
>more user friendly if you use pch="."
>Fangxin Hong Ph.D.
>Plant Biology Laboratory
>The Salk Institute
>10010 N. Torrey Pines Rd.
>La Jolla, CA 92037
>E-mail: fhong at salk.edu
>(Phone): 858-453-4100 ext 1105
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
Naomi S. Altman 814-865-3791 (voice)
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor