[BioC] Log transformation and left censoring

Gordon K Smyth smyth at wehi.EDU.AU
Tue Feb 5 01:45:23 CET 2013


BTW, the coefficients and log-fold-changes returned by glmFit() and 
exactTest() in edgeR also agree with what you would get from this 
transformation (but applied at the glm modelling level, not simply linear 
modelled on the log-scale).

Gordon

On Tue, 5 Feb 2013, Gordon K Smyth wrote:

> Dear Paul,
>
> The transformation that you propose is the same transformation that is done 
> by predFC(y) in the edgeR package, or by cpm(y,log=TRUE) in the developmental 
> version of the edgeR package.  The argument prior.count controls the 
> moderation amount.
>
> This is the same transformation that we recommend and use ourselves for 
> heatmaps.  See Section 2.10 of the edgeR User's Guide:
>
> http://bioconductor.org/packages/2.12/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf
>
> There is an example of its use on page 58 of the User's Guide.
>
> Belinda Phipson has shown as part of her PhD work that, under some 
> assumptions, this transformation comes close to minimizing the mean square 
> error when predicting the true log fold changes.
>
> Simply putting these logCPM values into limma will perform comparably to voom 
> if the library sizes are not very different, provided that you use 
> eBayes(fit,trend=TRUE).  When the library sizes are different, however, voom 
> is the clear winner.
>
> There is no censoring.  A major reason for adding an offset (aka prior.count) 
> to the counts is to avoid the need to censor, truncate or remove 
> observations.  Rather a mononotic transformation of the counts is performed 
> for each library.
>
> Best wishes
> Gordon
>
>
> On Jan 31, 2013, 8:57 AM, Paul Harrison <Paul.Harrison at monash.edu> wrote:
>
>> Hello,
>> 
>> We have been using voom and limma for some time now, and while we're fairly 
>> happy with it, it seems to produce significance levels that are on the 
>> conservative side. We also use edgeR to produce more optimistic results, 
>> but don't entirely trust the significance levels that it reports. I am 
>> looking for something in-between these extremes, and want to run an idea 
>> past this list as a sanity check. I would especially value Gordon and 
>> Charity's comments if they have time.
>> 
>> The voom log transformation is essentially:
>>
>>  log2( (count+0.5) / library.size )
>> 
>> It then does some clever things with weights. What I'm considering instead 
>> is
>>
>>  log2( count / library.size + moderation.amount / mean.library.size )
>> 
>> where moderation.amount is much larger then 0.5, say 5. A couple of things 
>> here:
>> 
>> - Instead of down-weighting low counts, I'm trying to get rid of the extra 
>> variation from low counts by artificially left censoring the data.
>> 
>> - I'm using the mean of the libaray sizes because I want the left censor to 
>> be in the same place for each sample even if the library sizes are 
>> different, so that if a gene is entirely switched off in one condition it 
>> won't look variable just because there is a different left censor in each 
>> sample.
>> 
>> I'm also using this transformation to create heatmaps.
>> 
>> This seems to be working with the data set I am working with, I get more 
>> significant results and they seem reasonable by eye. It seems to me that 
>> even if this approach isn't ideal it should at least be safe, at worst it 
>> will cause limma to reduce the df.prior and produce less significant 
>> results. Anything I've missed?
>> 
>> --
>> Paul Harrison
>> 
>> Victorian Bioinformatics Consortium / Monash University
>
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list