[BioC] VariantTools callVariants - binomial likelihood ratio

Tue Nov 19 19:33:51 CET 2013

Hi Michael,
Thanks for your answers. Now I understand the lrtFreqCutoff step.
I think the the values of p0 (=perror = 0.001) and p1 (=plower = 0.2) should be important for snp calling.
Sequencing error, p0/perror, can be derived from the raw read data, or even better can be recalibrated like SOAPsnp or GATK. But that’s a bit too much of work. On the other hand, why you choose to use p1=plower = 0.2?
I appreciate the simplicity of your method much, i.e. map the binomial likelihood ratio to the cutoff frequency. Have you tried to use the Bayesian probability score as in other method? I know it is more complicated but you get the significance level or p value this way. Thank you!

--------------------------------------------
On Sun, 11/17/13, Michael Lawrence <lawrence.michael at gene.com> wrote:

 Subject: Re: VariantTools callVariants - binomial likelihood ratio

 Cc: "bioconductor at r-project.org" <bioconductor at r-project.org>, "Michael Lawrence" <lawrence.michael at gene.com>
 Date: Sunday, November 17, 2013, 10:29 PM

 On Sun, Nov 17, 2013

 wrote:

 Hi,

 I try to use VariantTools package package and I am working
 on the vignette examples. I have some questions on the
 variant calling procedure. The vignette says: “The
 callVariants function uses a binomial likelihood ratio test
 for this purpose. The ratio is P(D|p = plower)=P(D|p =
 perror).”

 To understand this statement, I located the two internal
 functions that carry out this step.

 > VariantTools:::lrtFreqCutoff

 function (p0, p1, n = if (C == 1L) 1L else n, C = 1L)

 {

     num <- (1/n) * log(C) + log(1 - p0) - log(1 - p1)

     denom <- log(p1) - log(p0) + log(1 - p0) - log(1 -
 p1)

     num/denom

 }

 > VariantTools:::BinomialLRFilter

 function (p.lower = 0.2, p.error = 1/1000)

 {

     function(x) {

         freq.cutoff <- lrtFreqCutoff(p.error,
 p.lower)

         sample.freq <- altDepth(x)/totalDepth(x)

         passed <- sample.freq >= freq.cutoff

         passed[is.na(passed)] <- FALSE

         passed

     }

 }

 However, I get even more confused. Here are my questions:

 The binomial likelihood ratio should be
 p1^k*(1-p1)^(n-k)/(p0^k*(1-p0)^(n-k)), here n and k are
 total trial and number of hits. I don’t understand why it
 becomes num/denom in the lrtFreqCutoff function?

 The lrtFreqCutoff is not calculating the
 likelihood ratio: it's calculating the frequency (k/n)
 at which the ratio becomes greater than C. It takes some
 algebra to work this out. 'n' is the coverage, but
 it cancels out when C = 1. That's what the filter always
 uses. If it didn't we would need to consider the
 coverage at each position. As it is now, it is only a
 function of the two probabilities and so is constant (~ 4%
 using the defaults) for every position.

 What’s n and C in lrtFreqCutoff? These arguments are not
 used when BinomialLRFilter calls lrtFreqCutoff, why?

 In BinomialLRFilter, the criterion used is sample.freq >=
 freq.cutoff. But sample.freq <- altDepth(x)/totalDepth(x)
 is a frequency as defined, yet freq.cutoff seems to be the
 binomial likelihood ratio. How a frequency can be compared
 to likelihood ratio?

 In BinomialLRFilter, p0=perror = 0.001 and p1=plower = 0.2,
 the default p0 and p1 values are assumed, why not used the
 actual sequencing error rate and lowest variant frequency?

 The user can change the defaults. How do you
 propose we figure out the correct values automatically?  We
 could try to estimate the error rate from e.g. overlapping
 read ends, but that's just a heuristic.

[[elided Yahoo spam]]