[BioC] Biological replication (was RNA degradation problem)
Hannah at mpimp-golm.mpg.de
Sun Jan 22 15:51:04 CET 2006
This is interesting and there are certainly contrasting views. It is
also certainly not a bioC issue but is of interest. At the risk of
dragging it on I don't think this statement should be left without the
> -----Original Message-----
> From: fhong at salk.edu [mailto:fhong at salk.edu]
> Sent: 20 January 2006 20:12
> To: Henk van den Toorn
> Cc: 'Naomi Altman'; Matthew Hannah; bioconductor at stat.math.ethz.ch
> Subject: RE: [BioC] Biological replication (was RNA
> degradation problem)
> Thank you all for the useful input and interesting discussion.
> I agree with Henk, "biologicla replicates" means to include
> biological variation among individual plants, not the
> enviromental factors, such as growth chamber and climate. It
> is known that batch effect and lab effect are profound
> factors, which might, sometimes, block the true signals.
> Array experiments are still relatively expensive, we would
> prefer to eliminate enviromental factors (conduct experiments
> at the same time, same growth room) and include biological
> variation ( different plant samples as biological replicates).
I strongly believe that if you cannot prove that your results are
reproducible in at least 2 independent experiments then any
interpretation of such results is far from conclusive.
Microarrays are expensive, but many other methods are also expensive and
time-consuming but this does not exempt them from being shown to be
reproducible, it is also no more expensive (array-cost) to use 3
experiments versus 3 samples from one experiment. If biological
reproducibility was not an issue then why would we be using replicate
plants or experiments to measure metabolites, plant growth, etc...
rather than taking a single measurement on a huge pool of plants?
What I do agree with is that you are interested in the biological
variation and not studying environmental factors having spurious effects
on your results. But what is also obvious is that one batch of plants
grown at a single time in a single place is much more likely to yield
results where the 'biological factor' of interest is affected by a
biological factor-environment interaction. eg: plants with higher sugar
content may be more attractive to aphid attack, different Arabidopsis
ecotypes have differential sensitivity to mildew, or stress such as poor
watering + many other less observable interactions.
The final point is on when biological replication becomes technical
replication. It is obvious that if you take sufficiently large repeated
samples from a population that those samples will have an extremely high
probability of being 'almost' identical or put differently - essentially
the same as harvesting all of them together, grinding them and then
taking 2 aliquots of the material (ie:technical replica). Eg: if you
split a group of 1000 plants into 2 pools of 500 do you believe there
would be any difference between them compared to aliquoting the 1000
once ground? I think that 50 plants is already far beyond the point
where two pools of plants are essentially identical. In my experience,
when grown in randomised blocks in the same batch, 5-10 replicate plants
are usually sufficient to get virtually identical mean values for many
biological measurements. So does it then make sense to hybridise
'identical' samples and call them 'biological replicates', which in
addition could be misleading to the reader who understands that to mean
something quite different.
Having said all that, 'IF' you just want to identify a few, highly
changed, candidate genes that will be followed up (in independent
experiments), then independent array experiments are obviously not
essential. However, on the 'arrays are expensive' point I would be
interested if anyone had data to show how cost-effective using pooled
samples from the same experiment is in reducing the work for Q-PCR
verification. ie: the % confirmation rate for using genes selected based
on 1, 2 or 3 arrays.
> > I think the last question is very important. I guess you
> don't need to
> > try to INCREASE the biological variability at all cost for a single
> > experiment.
> > If you would be interested in combining experiments of the
> same lab in
> > a single analysis, it's probably wise to follow Naomi's
> advice to take
> > different replicates. A problem that might arise, is that
> the "biological"
> > variation is influenced by many circumstances inside a
> greenhouse or
> > growth chamber. In our lab practice, it's clear that
> influences like
> > the weather and the season have a profound influence on the
> biology of
> > the plant, even though our plants are kept in climate controlled
> > growth chambers. If you would use different batches of
> plants, you are
> > actually confounding these factors to the batches of
> plants. By using
> > different samples of plants, although unfortunately pooled
> in the same
> > circumstances, you might actually block the circumstances for later
> > analysis, if you're willing to go that far.
> > I'm very interested to see what other people have to say about this!
> > Henk van den Toorn, MSc
> > bioinformatician, Molecular Genetics group, Utrecht University
> > -----Original Message-----
> > From: bioconductor-bounces at stat.math.ethz.ch
> > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Naomi
> > Altman
> > Sent: 20 January 2006 01:42
> > To: Matthew Hannah; fhong at salk.edu
> > Cc: bioconductor at stat.math.ethz.ch
> > Subject: Re: [BioC] Biological replication (was RNA degradation
> > problem)
> > The question of what is appropriate biological replication
> is a tough one.
> > The objective is to obtain results that are valid in the
> population of
> > interest, which usually is not plants grown in a single
> batch in the
> > green house. But how much variability should we induce?
> Each batch
> > of plants grown separately but in the same building
> (different growth
> > chambers), grown in different labs? different universities?
> > In my very first Affy experiment, the investigator did the
> > following: 2 batches of plants grown separately, 2 samples
> of plants
> > from
> > 1
> > of the batches, 2 microarrays from one of the samples, for
> 4 arrays in
> > all.
> > The correlation among the results was 2 arrays from same sample > 2
> > samples from same batch > 2 batches. This should be no
> surprise, even
> > though we did not have enough replication to do any formal testing.
> > I think at minimum that you want to achieve results that would be
> > replicable within your own lab. That would suggest batches
> of plants
> > grown separately from separate batches of seed.
> > The best plan is a randomized complete block design, with every
> > condition sampled in every block. If the conditions are
> tissues, this
> > is readily achieved.
> > Personally, I look at the density plots of the probes on
> the arrays.
> > If they have the same "shape" (which is usually a unimodal
> > distribution with long tail to the right on the log2 scale) then I
> > cross my fingers (that is supposed to bring good luck) and
> use RMA.
> > Most of the experiments I have been involved with using arabidopsis
> > arrays have involved tissue differences, and the amount of
> > differential expression has been huge on the probeset scale
> (over 60%
> > of genes), but these probe densities have been pretty similar.
> > --Naomi
> > At 05:02 PM 1/19/2006, Matthew Hannah wrote:
> >>From: fhong at salk.edu [mailto:fhong at salk.edu]
> >>Sent: Thu 19/01/2006 21:27
> >>To: Matthew Hannah
> >>Cc: bioconductor at stat.math.ethz.ch
> >>Subject: Re: [BioC] RNA degradation problem
> >>Hi Matthew,
> >>Thank you very much for your help.
> >> > >It's amazing how many
> >> >> lab plant biologists see pooled samples from a bulk of plants
> >> >> grown at the same time as biological replicates when
> they are clearly not.
> >> >I would think that all plants under experiment shoudl be grown at
> >> >the same time without different conditions/treatments. Biological
> >> >replicates should be tissue samples from differnt groupd
> of plants,
> >> >say sample from 50 plants as replicate1 and sample from
> another 50
> >> >as
> > replicate 2.
> >> >Do you think that biological replicates should be grown
> at different
> > time?
> >>Absolutely! Biological replication must be either single
> plants grown
> >>in the same experiment (but noone wants to risk single plants for
> >>arrays) or large pools of plants from INDEPENDENT
> experiments (or the
> >>pools must be smaller than sample size - doesn't really happen for
> >>arrays) otherwise what biological variability are you sampling? Say
> >>you have 150 plants growing in the greenhouse and you
> harvest 3 random
> >>pools of 50 as your 3 'biological replicates' then you will have
> >>eliminated all variability from them and the arrays will be
> as good as
> >>technical replicates and any statistical testing is invalid.
> >> >> I find hist, RNA deg, AffyPLM and a simple RMA norm followed by
> >> >> plot(as.data.frame(exprs(eset.rma))) can answer in most
> cases for
> >> >> why it didn't work, or won't work - in the rare case
> when someone
> >> >> asks for QC
> >> > >before rather than after they realise the data is strange ;-)
> >> >This actually pull out another question: when % of differential
> >> >genes is large, which normalization better works better?
> >>I've posted on this alot about 1.5 years ago, you should find it in
> >>the archives - but simply noone knows or has tested it
> >> >http://cactus.salk.edu/temp/QC_t.doc
> >> >Take a look at the last plot, which clearly indicate homogeneous
> >> >within replicates and heterogeneous among samples.
> >> >(1) Will stem top and stem base differ so much? Or it is the
> >> >preparation process bring in extra correlaton within replicates.
> >> >(2) when % of differential genes is large, which normalization
> >> >better works better?
> >>Looking at these scatterplots, I can honestly say I've
> never seen so
> >>much DE. I would be suprised if samples such as different stem
> >>positions were so different. Something must be wrong with
> the samples
> >>or sampling in my opinion. The scatterplots are slightly more user
> >>friendly if you use pch="."
> >>Fangxin Hong Ph.D.
> >>Plant Biology Laboratory
> >>The Salk Institute
> >>10010 N. Torrey Pines Rd.
> >>La Jolla, CA 92037
> >>E-mail: fhong at salk.edu
> >>(Phone): 858-453-4100 ext 1105
> >>Bioconductor mailing list
> >>Bioconductor at stat.math.ethz.ch
> > Naomi S. Altman 814-865-3791 (voice)
> > Associate Professor
> > Dept. of Statistics 814-863-7114 (fax)
> > Penn State University 814-865-1348
> > University Park, PA 16802-2111
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> Fangxin Hong Ph.D.
> Plant Biology Laboratory
> The Salk Institute
> 10010 N. Torrey Pines Rd.
> La Jolla, CA 92037
> E-mail: fhong at salk.edu
> (Phone): 858-453-4100 ext 1105
More information about the Bioconductor