[BioC] Biological replication (was RNA degradation problem)

Fri Jan 20 17:06:05 CET 2006

Lot of discussion here. A couple of points to make, mainly looking at
the practical side.

Perfectly reproducible growth conditions are almost non-existance.
Climate controls are slightly influenced by the seasons, water status
and light conditions depend on position (edge effects) and the presence
of other plants (light absorbance/humidity). Finally, no big facility is
immune to the odd aphid or spot of mildew from our biotic friends.
However, most of these factors (especially after pooling) are more
variable in 2 separate experiments than 2 batches grown at the same
time. To be confident that an 'effect' is not dependent on these factors
I would like to know it can be reproduced. This also applies to sampling
- a certain stage or tissue should be able to be indentified and sampled
at separate times on different plants to be confident that its
definition is valid and that the results obtained would be reproducible
to someone repeating the experiment. ie: the variability would also
measure how well YOU can sample what you are claiming to be looking at!

As for pooling from the same large experiment, just to make things clear
- I'm talking about large pools from 1 big batch of plants. Randomised
block design (Naomi's discussion) can obviously be valid if all grown at
the same time but practically it also depends on the variable factors.
If there are 3 trays next to each other are there real block effects? -
eg: water should be a block effect but what if light is more variable
(eg: front-middle-back) and the watering is highly controlled? The best
way to avoid this is different positions or chambers or better,
different times.

Also from a cost point of view it seems a waste of money to hybridise
the same plant sample to replicate affy arrays when affy technical
replica are no longer deemed useful. If you harvest 2 random sets of 50
plants from the same group then you will get a R2 of >0.995 unless there
was a technical problem, save the money and design a better experiment.
To my understanding a statistical test on such material will be invalid
as you are allowing the test to use essentially technical variability
(post-plant growth) as an estimate of biological variability.

Cheers,
Matt

> -----Original Message-----
> From: Suresh Gopalan [mailto:gopalans at comcast.net] 
> Sent: 20 January 2006 15:21
> To: Matthew Hannah; Naomi Altman; fhong at salk.edu
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] Biological replication (was RNA 
> degradation problem)
> 
> I agree that independent replication is the best bet as of 
> now, though it has the risk of introduction of new 
> defects/hidden variables that influence the phenotype in 
> question, which may indeed be relevant or constitute another 
> line of study.
> 
> If one decides to take this risk or for other reasons does 
> experiments in perfectly identical (reproducible?) conditions 
> and take replicates pooled from a very large population (50 
> plants each in 3 replicates) as mentioned
> below: if that removes some variability inherent to each 
> plant, so be it. 
> Isn't the goal to study the variable of interest masking the 
> irrelevant variables (at least in that study)?  How would 
> this make statistical testing invalid?
> 
> I wonder if in either case it is any different or worse than 
> the normalization schemes and assumptions used in many of the 
> currently used popular analysis or summary schemes?
> 
> Suresh
> 
> (Suresh Gopalan, Ph.D)
> 
> 
> ----- Original Message -----
> From: "Matthew Hannah" <Hannah at mpimp-golm.mpg.de>
> To: "Naomi Altman" <naomi at stat.psu.edu>; <fhong at salk.edu>
> Cc: <bioconductor at stat.math.ethz.ch>
> Sent: Friday, January 20, 2006 6:02 AM
> Subject: Re: [BioC] Biological replication (was RNA 
> degradation problem)
> 
> 
> >>
> >> The question of what is appropriate biological replication is
> >> a tough one.  The objective is to obtain results that are
> >> valid in the population of interest, which usually is not
> >> plants grown in a single batch in the green house.  But how
> >> much variability should we induce?  Each batch of plants
> >> grown separately but in the same building (different growth
> >> chambers), grown in different labs?  different universities?
> >
> > Yes, but this is more a question of 'some' biological 
> replication versus
> > none. Obviously, if you have perfect reproducability of your growth
> > conditions then repeat experiments will have little 
> influence, but in my
> > experience independent experiments suitably accounts for slight
> > environmental and sampling (eg:time) variability. Plants 
> grown under the
> > same conditions are highly reproducable, so even the random 
> block design
> > might not be ideal depending on what the environmental factors are -
> > light, water, temp etc.. I would always favour separate, independent
> > experiments.
> >
> > As for reproducability in general this is a problem. I'm sure in all
> > fields that some patterns found by a certain lab, 
> labelling, scanning
> > etc.. will not be reproducible. For example, I wonder how 
> many training
> > set - sample set molecular diagnosis studies would continue 
> to work if
> > new independent data is introduced without updating the whole study.
> >
> >> In my very first Affy experiment, the investigator did the
> >> following:  2 batches of plants grown separately, 2 samples
> >> of plants from 1 of the batches, 2 microarrays from one of
> >> the samples, for 4 arrays in all.
> >> The correlation among the results was 2 arrays from same
> >> sample > 2 samples from same batch > 2 batches.  This should
> >> be no surprise, even though we did not have enough
> >> replication to do any formal testing.
> >>
> >> I think at minimum that you want to achieve results that
> >> would be replicable within your own lab.  That would suggest
> >> batches of plants grown separately from separate batches of seed.
> >>
> >> The best plan is a randomized complete block design, with
> >> every condition sampled in every block.  If the conditions
> >> are tissues, this is readily achieved.
> >
> > I assume you mean random in each independent experiment, and then
> > independently repeated, in which case this is the best approach.
> >
> >> Personally, I look at the density plots of the probes on the
> >> arrays.  If they have the same "shape" (which is usually a
> >> unimodal distribution with long tail to the right on the log2
> >> scale) then I cross my fingers (that is supposed to bring
> >> good luck) and use RMA.  Most of the experiments I have been
> >> involved with using arabidopsis arrays have involved tissue
> >> differences, and the amount of differential expression has
> >> been huge on the probeset scale (over 60% of genes), but
> >> these probe densities have been pretty similar.
> >
> > I always look at RNAdeg and PLM as well, but in most cases 
> this is also
> > seen on the density plots.
> >
> > Cheers,
> > MAtt
> >
> >> >From: fhong at salk.edu [mailto:fhong at salk.edu]
> >> >Sent: Thu 19/01/2006 21:27
> >> >To: Matthew Hannah
> >> >Cc: bioconductor at stat.math.ethz.ch
> >> >Subject: Re: [BioC] RNA degradation problem
> >> >
> >> >
> >> >
> >> >Hi Matthew,
> >> >
> >> >Thank you very much for your help.
> >> >
> >> > > >It's amazing how many
> >> > >> lab plant biologists see pooled samples from a bulk of
> >> plants grown
> >> > >> at the same time as biological replicates when they are
> >> clearly not.
> >> > >I would think that all plants under experiment shoudl be
> >> grown at the
> >> > >same time without different conditions/treatments. Biological
> >> > >replicates should be tissue samples from differnt groupd
> >> of plants,
> >> > >say sample from 50 plants as replicate1 and sample from
> >> another 50 as replicate 2.
> >> > >Do you think that biological replicates should be grown at
> >> different time?
> >> >
> >> >
> >> >Absolutely! Biological replication must be either single
> >> plants grown
> >> >in the same experiment (but noone wants to risk single plants for
> >> >arrays) or large pools of plants from INDEPENDENT
> >> experiments (or the
> >> >pools must be smaller than sample size - doesn't really happen for
> >> >arrays) otherwise what biological variability are you
> >> sampling? Say you
> >> >have 150 plants growing in the greenhouse and you harvest 3 random
> >> >pools of 50 as your 3 'biological replicates' then you will have
> >> >eliminated all variability from them and the arrays will be
> >> as good as
> >> >technical replicates and any statistical testing is invalid.
> >> >
> >> > >> I find hist, RNA deg, AffyPLM and a simple RMA norm 
> followed by
> >> > >> plot(as.data.frame(exprs(eset.rma))) can answer in most
> >> cases for
> >> > >> why it didn't work, or won't work - in the rare case
> >> when someone
> >> > >> asks for QC
> >> > > >before rather than after they realise the data is strange ;-)
> >> > >This actually pull out another question: when % of
> >> differential genes
> >> > >is large, which normalization better works better?
> >> >I've posted on this alot about 1.5 years ago, you should
> >> find it in the
> >> >archives - but simply noone knows or has tested it
> >> >
> >> >
> >> > >http://cactus.salk.edu/temp/QC_t.doc
> >> > >Take a look at the last plot, which clearly indicate homogeneous
> >> > >within replicates and heterogeneous among samples.
> >> > >(1) Will stem top and stem base differ so much? Or it is the
> >> > >preparation process bring in extra correlaton within replicates.
> >> > >(2) when % of differential genes is large, which
> >> normalization better
> >> > >works better?
> >> >Looking at these scatterplots, I can honestly say I've 
> never seen so
> >> >much DE. I would be suprised if samples such as different stem
> >> >positions were so different. Something must be wrong with
> >> the samples
> >> >or sampling in my opinion. The scatterplots are slightly more user
> >> >friendly if you use pch="."
> >> >
> >> >HTH,
> >> >
> >> >Matt
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >--------------------
> >> >Fangxin Hong  Ph.D.
> >> >Plant Biology Laboratory
> >> >The Salk Institute
> >> >10010 N. Torrey Pines Rd.
> >> >La Jolla, CA 92037
> >> >E-mail: fhong at salk.edu
> >> >(Phone): 858-453-4100 ext 1105
> >> >
> >> >_______________________________________________
> >> >Bioconductor mailing list
> >> >Bioconductor at stat.math.ethz.ch
> >> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>
> >> Naomi S. Altman                                814-865-3791 (voice)
> >> Associate Professor
> >> Dept. of Statistics                              814-863-7114 (fax)
> >> Penn State University                         814-865-1348
> >> (Statistics)
> >> University Park, PA 16802-2111
> >>
> >>
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > 
> 
> 
>