[BioC] Biological replication (was RNA degradation problem)

fhong@salk.edu fhong at salk.edu
Fri Jan 20 20:11:59 CET 2006

Thank you all for the useful input and interesting discussion.

I agree with Henk, "biologicla replicates" means to include biological
variation among individual plants, not the enviromental factors, such as
growth chamber and climate. It is known that batch effect and lab effect
are profound factors, which might, sometimes, block the true signals.
Array experiments are still relatively expensive, we would prefer to
eliminate enviromental factors (conduct experiments at the same time, same
growth room) and include biological variation ( different plant samples as
biological replicates).


> I think the last question is very important. I guess you don't need to try
> to INCREASE the biological variability at all cost for a single
> experiment.
> If you would be interested in combining experiments of the same lab in a
> single analysis, it's probably wise to follow Naomi's advice to take
> different replicates. A problem that might arise, is that the "biological"
> variation is influenced by many circumstances inside a greenhouse or
> growth
> chamber. In our lab practice, it's clear that influences like the weather
> and the season have a profound influence on the biology of the plant, even
> though our plants are kept in climate controlled growth chambers. If you
> would use different batches of plants, you are actually confounding these
> factors to the batches of plants. By using different samples of plants,
> although unfortunately pooled in the same circumstances, you might
> actually
> block the circumstances for later analysis, if you're willing to go that
> far.

> I'm very interested to see what other people have to say about this!
> Henk van den Toorn, MSc
> bioinformatician, Molecular Genetics group, Utrecht University
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch
> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Naomi Altman
> Sent: 20 January 2006 01:42
> To: Matthew Hannah; fhong at salk.edu
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Biological replication (was RNA degradation problem)
> The question of what is appropriate biological replication is a tough one.
> The objective is to obtain results that are valid in the population of
> interest, which usually is not plants grown in a single batch in the green
> house.  But how much variability should we induce?  Each batch of plants
> grown separately but in the same building (different growth chambers),
> grown
> in different labs?  different universities?
> In my very first Affy experiment, the investigator did the
> following:  2 batches of plants grown separately, 2 samples of plants from
> 1
> of the batches, 2 microarrays from one of the samples, for 4 arrays in
> all.
> The correlation among the results was 2 arrays from same sample > 2
> samples
> from same batch > 2 batches.  This should be no surprise, even though we
> did
> not have enough replication to do any formal testing.
> I think at minimum that you want to achieve results that would be
> replicable
> within your own lab.  That would suggest batches of plants grown
> separately
> from separate batches of seed.
> The best plan is a randomized complete block design, with every condition
> sampled in every block.  If the conditions are tissues, this is readily
> achieved.
> Personally, I look at the density plots of the probes on the arrays.  If
> they have the same "shape" (which is usually a unimodal distribution with
> long tail to the right on the log2 scale) then I cross my fingers (that is
> supposed to bring good luck) and use RMA.  Most of the experiments I have
> been involved with using arabidopsis arrays have involved tissue
> differences, and the amount of differential expression has been huge on
> the
> probeset scale (over 60% of genes), but these probe densities have been
> pretty similar.
> --Naomi
> At 05:02 PM 1/19/2006, Matthew  Hannah wrote:
>>From: fhong at salk.edu [mailto:fhong at salk.edu]
>>Sent: Thu 19/01/2006 21:27
>>To: Matthew Hannah
>>Cc: bioconductor at stat.math.ethz.ch
>>Subject: Re: [BioC] RNA degradation problem
>>Hi Matthew,
>>Thank you very much for your help.
>> > >It's amazing how many
>> >> lab plant biologists see pooled samples from a bulk of plants grown
>> >> at the same time as biological replicates when they are clearly not.
>> >I would think that all plants under experiment shoudl be grown at the
>> >same time without different conditions/treatments. Biological
>> >replicates should be tissue samples from differnt groupd of plants,
>> >say sample from 50 plants as replicate1 and sample from another 50 as
> replicate 2.
>> >Do you think that biological replicates should be grown at different
> time?
>>Absolutely! Biological replication must be either single plants grown
>>in the same experiment (but noone wants to risk single plants for
>>arrays) or large pools of plants from INDEPENDENT experiments (or the
>>pools must be smaller than sample size - doesn't really happen for
>>arrays) otherwise what biological variability are you sampling? Say you
>>have 150 plants growing in the greenhouse and you harvest 3 random
>>pools of 50 as your 3 'biological replicates' then you will have
>>eliminated all variability from them and the arrays will be as good as
>>technical replicates and any statistical testing is invalid.
>> >> I find hist, RNA deg, AffyPLM and a simple RMA norm followed by
>> >> plot(as.data.frame(exprs(eset.rma))) can answer in most cases for
>> >> why it didn't work, or won't work - in the rare case when someone
>> >> asks for QC
>> > >before rather than after they realise the data is strange ;-)
>> >This actually pull out another question: when % of differential genes
>> >is large, which normalization better works better?
>>I've posted on this alot about 1.5 years ago, you should find it in the
>>archives - but simply noone knows or has tested it
>> >http://cactus.salk.edu/temp/QC_t.doc
>> >Take a look at the last plot, which clearly indicate homogeneous
>> >within replicates and heterogeneous among samples.
>> >(1) Will stem top and stem base differ so much? Or it is the
>> >preparation process bring in extra correlaton within replicates.
>> >(2) when % of differential genes is large, which normalization better
>> >works better?
>>Looking at these scatterplots, I can honestly say I've never seen so
>>much DE. I would be suprised if samples such as different stem
>>positions were so different. Something must be wrong with the samples
>>or sampling in my opinion. The scatterplots are slightly more user
>>friendly if you use pch="."
>>Fangxin Hong  Ph.D.
>>Plant Biology Laboratory
>>The Salk Institute
>>10010 N. Torrey Pines Rd.
>>La Jolla, CA 92037
>>E-mail: fhong at salk.edu
>>(Phone): 858-453-4100 ext 1105
>>Bioconductor mailing list
>>Bioconductor at stat.math.ethz.ch
> Naomi S. Altman                                814-865-3791 (voice)
> Associate Professor
> Dept. of Statistics                              814-863-7114 (fax)
> Penn State University                         814-865-1348 (Statistics)
> University Park, PA 16802-2111
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor

Fangxin Hong  Ph.D.
Plant Biology Laboratory
The Salk Institute
10010 N. Torrey Pines Rd.
La Jolla, CA 92037
E-mail: fhong at salk.edu
(Phone): 858-453-4100 ext 1105

More information about the Bioconductor mailing list