[BioC] RNAseq expression analysis using DESeq: technical replicates, paired samples

Michael Muratet mmuratet at hudsonalpha.org
Mon Nov 7 22:56:35 CET 2011

On Apr 15, 2010, at 3:59 AM, Simon Anders wrote:

> Dear Jinfeng
> On Wed, 14 Apr 2010 22:28:17 +0000 (UTC), Jinfeng Liu <jinfengl at gene.com 
> >
> wrote:
>> I'm trying to use DESeq for RNAseq expression analysis. I haven't  
>> been
>> able to find information about how to deal with the following issues:
>> 1) technical replicates
>> We have two biological samples, two libraries (of different insert  
>> size)
>> were prepared for each of them. so I have four lanes of data in total
> and I
>> want to do differential expression between the two samples. It  
>> doesn't
> look quite
>> right to me to set up the condition vector as
>> conds <- c( "Sample1", "Sample1","Sample2","Sample2") since they are
> only
>> technical replicates, not biological. But I'm not sure what to do.
> If you set up your test this way, DESeq will assume that the variance
> between the replicates is all there is. Hence, roughly speaking, it  
> will
> call a difference significant if it is larger than the fluctuations
> observed between the technical replicates. This then only tells you  
> that
> the gene might be typically different between different samples, but  
> you
> won't know whether the difference is really due to the difference in
> treatment or whether you would have observed the same magnitude of
> difference between two samples that have been treated the same way.
> Of course, without biological replicates, there is no way to settle  
> this
> question properly.
> The best thing you can do is to add up the counts from each sample,  
> and
> compare just one data column with summed data from Sample 1 with one  
> data
> column for Sample 2. Call DESeq's 'estimateVarianceFunctions' function
> with
> the argument 'pool=TRUE', and it will ignore the sample labels and
> estimate
> the variance between the conditions. Hence, it will only call those  
> genes
> differentially expressed that have a much stronger difference between
> conditions than the other genes of similar expression strength. You  
> might
> find only few differentially expressed genes, but these are the only  
> ones
> for which you can be somewhat sure that they are proper hits.


I would like to verify that 18 months later, adding counts for  
technical replicates is still the best approach for combining  
technical replicates.

We are constrained to single biological replicates, but we're  
interested in using the technical replicates if we can. I recall that  
limma had ways to set up the model matrix.



>> 2) Paired samples
>> We have samples from three patients. For each patient, we have  
>> matched
>> tumor and adjacent normal samples. How should we set up the  
>> analysis to
> capture the
>> pair information?
> Sorry, but DESeq does not support paired tests (yet). I have some  
> ideas on
> how to add this but this might take a while.
> For now, your best option is to use DESeq's  
> 'getVarianceStabilizedData' to
> transform your data to a scale on which it is approximately  
> homoskedastic.
> Then, you can use a pair-wise t-test or a pair-wise z-test. (Don't  
> do this
> with the raw data, use DESeq's variance-stabilizing transcformation to
> make
> them homoskedastic first.)
> The pairwise t-test should work out of the box with R's standard  
> 't.test'
> function. A pair-wise z-test should have more power in this setting,  
> as,
> after the variance-stabilizing transformation, you may assume that all
> data
> has the same variance. Estimate this variance from your genes in a  
> pooled
> fashion (ask again if you don't know how to do that) and take the  
> median.
> Divide the pair differences by the square root of this to get z  
> scores,
> then use 'pnorm' to get a p value. In my experience, this should work
> reasonably well even though it may not have as much power as a  
> proper NB
> test would have.
> Cheers
>  Simon
> +---
> | Dr. Simon Anders, Dipl.-Phys.
> | European Molecular Biology Laboratory (EMBL), Heidelberg
> | office phone +49-6221-387-8632
> | preferred (permanent) e-mail: sanders at fs.tum.de
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

Michael Muratet, Ph.D.
Senior Scientist
HudsonAlpha Institute for Biotechnology
mmuratet at hudsonalpha.org
(256) 327-0473 (p)
(256) 327-0966 (f)

Room 4005
601 Genome Way
Huntsville, Alabama 35806

More information about the Bioconductor mailing list