[BioC] DESeq2 - regularised log transformation blind or not?

Wolfgang Huber whuber at embl.de
Mon Feb 24 16:33:50 CET 2014


Hi Mike

> If it helps in framing my question, I am more interested in how the genes cluster within a cell-type than how the cell types cluster.

I am not sure I can reconcile this question and the data (experimental design) you presented. If the aim is to cluster genes within cell type, would you not do more than two replicates per cell type? (Clustering of genes based on only two samples seems the equivalent of "underpowered”.) 
And does it mean you are interested in doing six different clusterings of genes, and comparing them?

I suppose these are not single-cell data? Since the default workflow of DESeq2 may find difficulties with such data, due to their greater sampling noise.

	Wolfgang 



> Yours,
> 
> Mike
> 
> <Blind.png><notBlind.png>
> 
> 
> On 24 Feb 2014, at 14:46, Wolfgang Huber <whuber at embl.de> wrote:
> 
>> Hi Mike
>> 
>> Thanks.
>> The other Mike (Love) will chime in regarding the theoretical considerations regarding the two choices (blind=FALSE or TRUE).
>> What I’d be interested in is whether the two make any significant difference to the clustering result (e.g. PCA/MDS plot) for your data?
>> 
>> 	best wishes
>> 		Wolfgang
>> 
>> On 24 Feb 2014, at 15:21, Mike Stubbington <mstubb at ebi.ac.uk> wrote:
>> 
>>> Hi,
>>> 
>>> I have just been reading the updated vignette for DESeq2 in the bioconductor devel branch (http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf) and was interested by the comments in section 2.1.1 about the appropriateness of setting the blind argument when performing regularised log transformation. Specifically, the comment that
>>> 
>>> “...blind dispersion estimation is not the appropriate choice if one expects that many or the majority of genes (rows) will have large differences in counts which are explanable by the experimental design…”
>>> 
>>> Given this, I would really appreciate some further advice about when one should set blind=FALSE.
>>> 
>>> For example, I am performing gene clustering using RNA-seq data for different six cell types. I would certainly expect a lot of genes to vary between the samples. Is this a case when blind=FALSE might be appropriate? 
>>> 
>>> Thank you for your help,
>>> 
>>> Mike
>>> 
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>> 
> 



More information about the Bioconductor mailing list