[BioC] DESeq2 Regularised Log for Clustering of Genes

Fri May 9 13:48:42 CEST 2014

hi Dario,

I CC the Bioconductor mailing list,

On Wed, May 7, 2014 at 11:00 PM, Dario Strbenac
<dstr7320 at uni.sydney.edu.au> wrote:
> Hello,
>
> As section 5.3 of the vignette explains, the transformed data can be used for applications like clustering of samples. I was considering the best way to use it instead for clustering genes of a time-series experiment. I would have to account for gene length to make different genes comparable. This could be done after the transformation, by dividing by appropriate constants.

I would divide before transformation or subtract after transformation
as log2(x * k) = log2(x) + log2(k), where x is some row-wise constant.
Both DESeq2 transformations are log2-like.

But I would also suggest you might want to center the genes before clustering:

mat <- assay(rld)
matcenter <- sweep( mat, 1, rowMeans(mat), "-" )

Now each gene should have mean 0. This makes sense if you are
interested in clustering genes which have the same trend, but maybe
different expression strength ("up, down, up", etc.).

> Also, the counts used are probabilistically assigned counts to transcripts by RSEM. Are you aware of any previous studies which use the transformed data for such an analysis ?

Not off the top of my head.

Mike

>
> --------------------------------------
> Dario Strbenac
> PhD Student
> University of Sydney
> Camperdown NSW 2050
> Australia