[BioC] Log transformation and left censoring

Sun Feb 3 12:24:33 CET 2013

Hi Paul

given your description, one possibility to explore might be a variance stabilising transformation.

E.g. DESeq provides one that smoothly interpolates between the square-root function for low counts and the log-transformation for higher counts, see Section 6 (and 7) of the vignette.

	Best wishes
	Wolfgang

Il giorno Jan 31, 2013, alle ore 8:57 AM, Paul Harrison <Paul.Harrison at monash.edu> ha scritto:

> Hello,
> 
> We have been using voom and limma for some time now, and while we're
> fairly happy with it, it seems to produce significance levels that are
> on the conservative side. We also use edgeR to produce more optimistic
> results, but don't entirely trust the significance levels that it
> reports. I am looking for something in-between these extremes, and
> want to run an idea past this list as a sanity check. I would
> especially value Gordon and Charity's comments if they have time.
> 
> The voom log transformation is essentially:
> 
>  log2( (count+0.5) / library.size )
> 
> It then does some clever things with weights. What I'm considering instead is
> 
>  log2( count / library.size + moderation.amount / mean.library.size )
> 
> where moderation.amount is much larger then 0.5, say 5. A couple of things here:
> 
> - Instead of down-weighting low counts, I'm trying to get rid of the
> extra variation from low counts by artificially left censoring the
> data.
> 
> - I'm using the mean of the libaray sizes because I want the left
> censor to be in the same place for each sample even if the library
> sizes are different, so that if a gene is entirely switched off in one
> condition it won't look variable just because there is a different
> left censor in each sample.
> 
> I'm also using this transformation to create heatmaps.
> 
> This seems to be working with the data set I am working with, I get
> more significant results and they seem reasonable by eye. It seems to
> me that even if this approach isn't ideal it should at least be safe,
> at worst it will cause limma to reduce the df.prior and produce less
> significant results. Anything I've missed?
> 
> -- 
> Paul Harrison
> 
> Victorian Bioinformatics Consortium / Monash University
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor