[BioC] DESeq2 - regularised log transformation blind or not?

Mon Feb 24 16:48:15 CET 2014

Dear Wolfgang and Mike,

These are not single-cell data. 

I agree that this is not a powerful approach! Rest assured that no major conclusions will be drawn from the gene clusters; it was more for my own interest whilst making sure that I understand the correct use of the rlog transform. I should probably have said “I am more interested *at the moment* in how the genes cluster…” since I’m also interested in the way that the cell types cluster. The change from FALSE to TRUE seemed to have a greater effect upon the gene clusters than the cell clusters so it was that that piqued my interest.

---------

Mike, 

Thank you very much for your reply. It was enormously helpful. 

If I may, I would like to ask one more question: I would like to look at differential gene expression between the cell types using contrasts. Would you recommend 

1) Using DESeq2 v 1.2.10 with betaPrior=FALSE as an argument when calling DESeq

or

2) Using the development version where expanded model matrices have been implemented?

Thank you again for your help,

Mike

On 24 Feb 2014, at 15:33, Wolfgang Huber <whuber at embl.de> wrote:

> Hi Mike
> 
>> If it helps in framing my question, I am more interested in how the genes cluster within a cell-type than how the cell types cluster.
> 
> I am not sure I can reconcile this question and the data (experimental design) you presented. If the aim is to cluster genes within cell type, would you not do more than two replicates per cell type? (Clustering of genes based on only two samples seems the equivalent of "underpowered”.) 
> And does it mean you are interested in doing six different clusterings of genes, and comparing them?
> 
> I suppose these are not single-cell data? Since the default workflow of DESeq2 may find difficulties with such data, due to their greater sampling noise.
> 
> 	Wolfgang 
> 
> 
> 
>> Yours,
>> 
>> Mike
>> 
>> <Blind.png><notBlind.png>
>> 
>> 
>> On 24 Feb 2014, at 14:46, Wolfgang Huber <whuber at embl.de> wrote:
>> 
>>> Hi Mike
>>> 
>>> Thanks.
>>> The other Mike (Love) will chime in regarding the theoretical considerations regarding the two choices (blind=FALSE or TRUE).
>>> What I’d be interested in is whether the two make any significant difference to the clustering result (e.g. PCA/MDS plot) for your data?
>>> 
>>> 	best wishes
>>> 		Wolfgang
>>> 
>>> On 24 Feb 2014, at 15:21, Mike Stubbington <mstubb at ebi.ac.uk> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I have just been reading the updated vignette for DESeq2 in the bioconductor devel branch (http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.pdf) and was interested by the comments in section 2.1.1 about the appropriateness of setting the blind argument when performing regularised log transformation. Specifically, the comment that
>>>> 
>>>> “...blind dispersion estimation is not the appropriate choice if one expects that many or the majority of genes (rows) will have large differences in counts which are explanable by the experimental design…”
>>>> 
>>>> Given this, I would really appreciate some further advice about when one should set blind=FALSE.
>>>> 
>>>> For example, I am performing gene clustering using RNA-seq data for different six cell types. I would certainly expect a lot of genes to vary between the samples. Is this a case when blind=FALSE might be appropriate? 
>>>> 
>>>> Thank you for your help,
>>>> 
>>>> Mike
>>>> 
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> 
>> 
>