[BioC] How to use DESeq to normalize and estimate variance in a RNAseq timecourse analysis

Marie Sémon marie.semon at ens-lyon.fr
Fri May 11 14:52:50 CEST 2012


Hi Wolfgang

We wanted first to determine a set of genes for which expression level 
differs statistically between at least one time point and the controls, 
because we need to estimate the whole set of genes regulated at some 
point or the other by the treatment. This is why we compared 
sequentially Ctr/T1 , Ctr/T2, Ctr/T3 etc... and then took the union of 
these five lists. We performed this kind of analysis because we thought 
that in DESeq it is not possible to test wether a gene is deregulated 
over the whole time series experiment. But perhaps are we wrong here?

 >While each time point does not have a replicate, if the biological 
signal that you are interested in appears and disappears at rates lower 
than the sampling time >interval, you can still get an idea about some 
of the variability in the data, e.g. by fitting a trend and looking at 
the residuals.
I'm sorry but I have not understood your suggestion here...

However, we performed  the clustering you suggested (as described in 
DESeq vignette), and we reassuringly recovered the grouping of the 
samples according to our time points (controls grouped together, then 
point 1, point 2, point 3 etc). We also obtained clusters of genes 
corresponding to coexpressed genes that separate, somewhat reassuringly, 
genes known to be regulated early or later after treatment.  I guess 
that p-values could be obtained from this clustering to assess 
statistically these clusters of  genes with similar expression profiles 
(maybe via a boostrap analysis?). Is that what you meant by "getting 
p-values from that"?

Thanks a lot again for your suggestions,

Best wishes,

Marie




Le 10/05/12 21:04, Wolfgang Huber a écrit :
>
> Hi Marie
>
> Simon and you raised the point that comparing each of the five time 
> points (unreplicated) against control, and then presumably comparing 
> these lists (for what? overlap?) is likely suboptimal.
>
> While each time point does not have a replicate, if the biological 
> signal that you are interested in appears and disappears at rates 
> lower than the sampling time interval, you can still get an idea about 
> some of the variability in the data, e.g. by fitting a trend and 
> looking at the residuals. The first thing I would do here, in fact, is 
> to transform the data on a variance stabilised scale (with DESeq, as 
> described in the vignette), filter out all genes that show too small 
> variability overall, and then cluster the patterns. You don't directly 
> get p-values from that (though with some imagination that can be 
> done), but it might be a lot more informative than 5 lists.
>
> In any case, having a replicate of the time course seems essential for 
> reliable inference.
>
>     Best wishes
>     Wolfgang



More information about the Bioconductor mailing list