[BioC] Running DESeq with 1000 samples

Wed Jul 9 21:58:42 CEST 2014

Hi

On 09/07/14 20:58, Maoqi Xu [guest] wrote:

> I'm using DESeq to find the differential expressed genes between 2
> populations. The RNA-seq data set has a total sample size of around
> 1000. However, even I set the memory limit of R to 6 Gb, it still
> reports the error that it cannot allocate vector of certain size. I
> wonder if it's possible to use DESeq on this huge data set and how
> much memory should be enough.

You really have one thousand RNA-Seq libraries? This is impressive.

First: As Steve already pointed out, please consider using DESeq2.

On the other hand: The main point of tools like DESeq2 or edgeR is to 
use information sharing, such as Bayesian shrinkage, to get decent power 
even if the sample size is only modest.

With so much data, you can keep things very simple, especially if you 
really just have a standard two-group comparison with no other 
covariates. I would use DESeq2 only to normalize the data and then do a 
Wilcoxon rank-sum test on the normalized counts, for each gene 
separately, or, even better, use a permutation test.

   Simon