[BioC] Biological replication (was RNA degradation problem)

Sun Jan 22 16:55:52 CET 2006

PCR is also noisy. --Naomi

At 09:51 AM 1/22/2006, Matthew  Hannah wrote:
>This is interesting and there are certainly contrasting views. It is
>also certainly not a bioC issue but is of interest. At the risk of
>dragging it on I don't think this statement should be left without the
>comments below.
>
> > -----Original Message-----
> > From: fhong at salk.edu [mailto:fhong at salk.edu]
> > Sent: 20 January 2006 20:12
> > To: Henk van den Toorn
> > Cc: 'Naomi Altman'; Matthew Hannah; bioconductor at stat.math.ethz.ch
> > Subject: RE: [BioC] Biological replication (was RNA
> > degradation problem)
> >
> > Thank you all for the useful input and interesting discussion.
> >
> > I agree with Henk, "biologicla replicates" means to include
> > biological variation among individual plants, not the
> > enviromental factors, such as growth chamber and climate. It
> > is known that batch effect and lab effect are profound
> > factors, which might, sometimes, block the true signals.
> > Array experiments are still relatively expensive, we would
> > prefer to eliminate enviromental factors (conduct experiments
> > at the same time, same growth room) and include biological
> > variation ( different plant samples as biological replicates).
>
>I strongly believe that if you cannot prove that your results are
>reproducible in at least 2 independent experiments then any
>interpretation of such results is far from conclusive.
>
>Microarrays are expensive, but many other methods are also expensive and
>time-consuming but this does not exempt them from being shown to be
>reproducible, it is also no more expensive (array-cost) to use 3
>experiments versus 3 samples from one experiment. If biological
>reproducibility was not an issue then why would we be using replicate
>plants or experiments to measure metabolites, plant growth, etc...
>rather than taking a single measurement on a huge pool of plants?
>
>What I do agree with is that you are interested in the biological
>variation and not studying environmental factors having spurious effects
>on your results. But what is also obvious is that one batch of plants
>grown at a single time in a single place is much more likely to yield
>results where the 'biological factor' of interest is affected by a
>biological factor-environment interaction. eg: plants with higher sugar
>content may be more attractive to aphid attack, different Arabidopsis
>ecotypes have differential sensitivity to mildew, or stress such as poor
>watering + many other less observable interactions.
>
>The final point is on when biological replication becomes technical
>replication. It is obvious that if you take sufficiently large repeated
>samples from a population that those samples will have an extremely high
>probability of being 'almost' identical or put differently - essentially
>the same as harvesting all of them together, grinding them and then
>taking 2 aliquots of the material (ie:technical replica). Eg: if you
>split a group of 1000 plants into 2 pools of 500 do you believe there
>would be any difference between them compared to aliquoting the 1000
>once ground? I think that 50 plants is already far beyond the point
>where two pools of plants are essentially identical. In my experience,
>when grown in randomised blocks in the same batch, 5-10 replicate plants
>are usually sufficient to get virtually identical mean values for many
>biological measurements. So does it then make sense to hybridise
>'identical' samples and call them 'biological replicates', which in
>addition could be misleading to the reader who understands that to mean
>something quite different.
>
>Having said all that, 'IF' you just want to identify a few, highly
>changed, candidate genes that will be followed up (in independent
>experiments), then independent array experiments are obviously not
>essential. However, on the 'arrays are expensive' point I would be
>interested if anyone had data to show how cost-effective using pooled
>samples from the same experiment is in reducing the work for Q-PCR
>verification. ie: the % confirmation rate for using genes selected based
>on 1, 2 or 3 arrays.
>
>Cheers,
>MAtt
>
>
>
>
> > > I think the last question is very important. I guess you
> > don't need to
> > > try to INCREASE the biological variability at all cost for a single
> > > experiment.
> > > If you would be interested in combining experiments of the
> > same lab in
> > > a single analysis, it's probably wise to follow Naomi's
> > advice to take
> > > different replicates. A problem that might arise, is that
> > the "biological"
> > > variation is influenced by many circumstances inside a
> > greenhouse or
> > > growth chamber. In our lab practice, it's clear that
> > influences like
> > > the weather and the season have a profound influence on the
> > biology of
> > > the plant, even though our plants are kept in climate controlled
> > > growth chambers. If you would use different batches of
> > plants, you are
> > > actually confounding these factors to the batches of
> > plants. By using
> > > different samples of plants, although unfortunately pooled
> > in the same
> > > circumstances, you might actually block the circumstances for later
> > > analysis, if you're willing to go that far.
> >
> >
> >
> > > I'm very interested to see what other people have to say about this!
> > >
> > > Henk van den Toorn, MSc
> > > bioinformatician, Molecular Genetics group, Utrecht University
> > >
> > >
> > > -----Original Message-----
> > > From: bioconductor-bounces at stat.math.ethz.ch
> > > [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Naomi
> > > Altman
> > > Sent: 20 January 2006 01:42
> > > To: Matthew Hannah; fhong at salk.edu
> > > Cc: bioconductor at stat.math.ethz.ch
> > > Subject: Re: [BioC] Biological replication (was RNA degradation
> > > problem)
> > >
> > > The question of what is appropriate biological replication
> > is a tough one.
> > > The objective is to obtain results that are valid in the
> > population of
> > > interest, which usually is not plants grown in a single
> > batch in the
> > > green house.  But how much variability should we induce?
> > Each batch
> > > of plants grown separately but in the same building
> > (different growth
> > > chambers), grown in different labs?  different universities?
> > >
> > > In my very first Affy experiment, the investigator did the
> > > following:  2 batches of plants grown separately, 2 samples
> > of plants
> > > from
> > > 1
> > > of the batches, 2 microarrays from one of the samples, for
> > 4 arrays in
> > > all.
> > > The correlation among the results was 2 arrays from same sample > 2
> > > samples from same batch > 2 batches.  This should be no
> > surprise, even
> > > though we did not have enough replication to do any formal testing.
> > >
> > > I think at minimum that you want to achieve results that would be
> > > replicable within your own lab.  That would suggest batches
> > of plants
> > > grown separately from separate batches of seed.
> > >
> > > The best plan is a randomized complete block design, with every
> > > condition sampled in every block.  If the conditions are
> > tissues, this
> > > is readily achieved.
> > >
> > > Personally, I look at the density plots of the probes on
> > the arrays.
> > > If they have the same "shape" (which is usually a unimodal
> > > distribution with long tail to the right on the log2 scale) then I
> > > cross my fingers (that is supposed to bring good luck) and
> > use RMA.
> > > Most of the experiments I have been involved with using arabidopsis
> > > arrays have involved tissue differences, and the amount of
> > > differential expression has been huge on the probeset scale
> > (over 60%
> > > of genes), but these probe densities have been pretty similar.
> > >
> > > --Naomi
> > >
> > > At 05:02 PM 1/19/2006, Matthew  Hannah wrote:
> > >>
> > >>
> > >>________________________________
> > >>
> > >>From: fhong at salk.edu [mailto:fhong at salk.edu]
> > >>Sent: Thu 19/01/2006 21:27
> > >>To: Matthew Hannah
> > >>Cc: bioconductor at stat.math.ethz.ch
> > >>Subject: Re: [BioC] RNA degradation problem
> > >>
> > >>
> > >>
> > >>Hi Matthew,
> > >>
> > >>Thank you very much for your help.
> > >>
> > >> > >It's amazing how many
> > >> >> lab plant biologists see pooled samples from a bulk of plants
> > >> >> grown at the same time as biological replicates when
> > they are clearly not.
> > >> >I would think that all plants under experiment shoudl be grown at
> > >> >the same time without different conditions/treatments. Biological
> > >> >replicates should be tissue samples from differnt groupd
> > of plants,
> > >> >say sample from 50 plants as replicate1 and sample from
> > another 50
> > >> >as
> > > replicate 2.
> > >> >Do you think that biological replicates should be grown
> > at different
> > > time?
> > >>
> > >>
> > >>Absolutely! Biological replication must be either single
> > plants grown
> > >>in the same experiment (but noone wants to risk single plants for
> > >>arrays) or large pools of plants from INDEPENDENT
> > experiments (or the
> > >>pools must be smaller than sample size - doesn't really happen for
> > >>arrays) otherwise what biological variability are you sampling? Say
> > >>you have 150 plants growing in the greenhouse and you
> > harvest 3 random
> > >>pools of 50 as your 3 'biological replicates' then you will have
> > >>eliminated all variability from them and the arrays will be
> > as good as
> > >>technical replicates and any statistical testing is invalid.
> > >>
> > >> >> I find hist, RNA deg, AffyPLM and a simple RMA norm followed by
> > >> >> plot(as.data.frame(exprs(eset.rma))) can answer in most
> > cases for
> > >> >> why it didn't work, or won't work - in the rare case
> > when someone
> > >> >> asks for QC
> > >> > >before rather than after they realise the data is strange ;-)
> > >> >This actually pull out another question: when % of differential
> > >> >genes is large, which normalization better works better?
> > >>I've posted on this alot about 1.5 years ago, you should find it in
> > >>the archives - but simply noone knows or has tested it
> > >>
> > >>
> > >> >http://cactus.salk.edu/temp/QC_t.doc
> > >> >Take a look at the last plot, which clearly indicate homogeneous
> > >> >within replicates and heterogeneous among samples.
> > >> >(1) Will stem top and stem base differ so much? Or it is the
> > >> >preparation process bring in extra correlaton within replicates.
> > >> >(2) when % of differential genes is large, which normalization
> > >> >better works better?
> > >>Looking at these scatterplots, I can honestly say I've
> > never seen so
> > >>much DE. I would be suprised if samples such as different stem
> > >>positions were so different. Something must be wrong with
> > the samples
> > >>or sampling in my opinion. The scatterplots are slightly more user
> > >>friendly if you use pch="."
> > >>
> > >>HTH,
> > >>
> > >>Matt
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>--------------------
> > >>Fangxin Hong  Ph.D.
> > >>Plant Biology Laboratory
> > >>The Salk Institute
> > >>10010 N. Torrey Pines Rd.
> > >>La Jolla, CA 92037
> > >>E-mail: fhong at salk.edu
> > >>(Phone): 858-453-4100 ext 1105
> > >>
> > >>_______________________________________________
> > >>Bioconductor mailing list
> > >>Bioconductor at stat.math.ethz.ch
> > >>https://stat.ethz.ch/mailman/listinfo/bioconductor
> > >
> > > Naomi S. Altman                                814-865-3791 (voice)
> > > Associate Professor
> > > Dept. of Statistics                              814-863-7114 (fax)
> > > Penn State University                         814-865-1348
> > (Statistics)
> > > University Park, PA 16802-2111
> > >
> > > _______________________________________________
> > > Bioconductor mailing list
> > > Bioconductor at stat.math.ethz.ch
> > > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > >
> > >
> > >
> >
> >
> > --------------------
> > Fangxin Hong  Ph.D.
> > Plant Biology Laboratory
> > The Salk Institute
> > 10010 N. Torrey Pines Rd.
> > La Jolla, CA 92037
> > E-mail: fhong at salk.edu
> > (Phone): 858-453-4100 ext 1105
> >
> >
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111