[BioC] Log transformation and left censoring

Paul Harrison Paul.Harrison at monash.edu
Thu Jan 31 08:57:24 CET 2013


Hello,

We have been using voom and limma for some time now, and while we're
fairly happy with it, it seems to produce significance levels that are
on the conservative side. We also use edgeR to produce more optimistic
results, but don't entirely trust the significance levels that it
reports. I am looking for something in-between these extremes, and
want to run an idea past this list as a sanity check. I would
especially value Gordon and Charity's comments if they have time.

The voom log transformation is essentially:

  log2( (count+0.5) / library.size )

It then does some clever things with weights. What I'm considering instead is

  log2( count / library.size + moderation.amount / mean.library.size )

where moderation.amount is much larger then 0.5, say 5. A couple of things here:

- Instead of down-weighting low counts, I'm trying to get rid of the
extra variation from low counts by artificially left censoring the
data.

- I'm using the mean of the libaray sizes because I want the left
censor to be in the same place for each sample even if the library
sizes are different, so that if a gene is entirely switched off in one
condition it won't look variable just because there is a different
left censor in each sample.

I'm also using this transformation to create heatmaps.

This seems to be working with the data set I am working with, I get
more significant results and they seem reasonable by eye. It seems to
me that even if this approach isn't ideal it should at least be safe,
at worst it will cause limma to reduce the df.prior and produce less
significant results. Anything I've missed?

-- 
Paul Harrison

Victorian Bioinformatics Consortium / Monash University



More information about the Bioconductor mailing list