[BioC] boot.phylo( ) function

Emmanuel Paradis paradis at isem.univ-montp2.fr
Thu Mar 22 09:15:55 CET 2007


Dear all,

I have replied privately to this query, mentioning that the BioC list 
may not be appropriate for this kind of question, but I was apparently 
wrong (I'm not on the list, so not aware of the usual topics).

Martin Morgan wrote:
> Hi Nora --
> 
> I do not have direct experience with this package, but from the help
> page
> 
>> library(ape)
>> ?boot.phylo
> 
> perhaps
> 
> allbacteria <- read.dna("allbacteriafasta","fasta")
> distallbacteria <-
>   dist.dna(allbacteria, pairwise.deletion=TRUE, as.matrix=TRUE)
> njtree <- nj(distallbacteria)
> 
> boot.phylo(njtree,
>            allbacteria,
>            FUN=function(bootstrappedData) {
>                nj(dist.dna(bootstrappedData,
>                            pairwise.deletion=TRUE,
>                            as.matrix=TRUE))
>            })
> 
> works?

Yes, this is it. The option as.matrix=TRUE is not needed, and will 
likely slow down the whole thing.

Earl Glynn asked me for an example of using boot.phylo with a "toy" 
problem. Here's what can be done in R:

library(ape)
data(woodmouse)
d <- dist.dna(woodmouse)
tr <- nj(d)
bp <- boot.phylo(tr, woodmouse, function(xx) nj(dist.dna(xx)))
plot(tr)
nodelabels(bp)

Then you can change the options of dist.dna and boot.phylo to see how 
this may affect the results. There is a more "real life" example in my 
book "APER" (this case study is actually spread in several chapters, so 
that it explains the analysis from A to Z, or nearly).

boot.phylo is fairly fast for small to moderate size, on my laptop I have:

 > system.time(bp <- boot.phylo(tr, woodmouse, function(xx) 
nj(dist.dna(xx))))
[1] 11.089  0.008 11.098  0.000  0.000

These times are expected to be quite longer for big data sets as a 
substantial part of the computation is done in R (I plan to rewrite this 
in C sometime). Also I have experimental codes for dist.dna that are 
much faster than the current ones.

> Martin
> 
> Nora Muda <noramuda at yahoo.com> writes:
> 
>> Dear BioConducter useRs,
>>  
>> I have problems in writing boot.phylo() function. 
>>  
>> Let say I have 30 aligned sequences; then I computed
>> pairwise distances with "K80" method. Then I construct
>> phylogenetic tree with neighbor-joining method and my
>> proposed method. Now I have problems in writing "FUN"
>> in boot.phylo() function. Below are examples of my
>> programs:
>>  
>> library(ape)
>> allbacteria <- read.dna("allbacteriafasta","fasta")
>> distallbacteria <-
>> dist.dna(allbacteria,pairwise.deletion=TRUE,as.matrix=TRUE)
>> plot(nj(distallbacteria))
>>  
>> boot.phylo(plot(nj(distallbacteria)),allbacteria,nj(distallbacteria))
>>  
>> What should I put as FUN in boot.phylo?
>>  
>> I make comparison between distances of "K80" in PHYLIP
>> and dist.dna("K80").There are a lot of differences esp
>> in PHYLIP there is a default for
>> transversion/transition rate; which is 2 but not in
>> ape package. How to modify it to make it the same?

Here the answer I sent to Nora:

Indeed, and these differences are expected. PHYLIP assumes a 
"theoretically expected" value of the ts/tv ratio, the distance is then 
calculated by maximizing a likelihood function. (I owe this information 
to my colleague Olivier Gascuel who dissected PHYLIP's code.) You can 
change the default of the ts/tv ratio, in which case it is also 
estimated by ML. dist.dna is faithful to Kimura's original formulae; it 
also computes the variance of the distances (which PHYLIP does not).

Emmanuel Paradis



More information about the Bioconductor mailing list