[BioC] stringDist ?

Hervé Pagès hpages at fhcrc.org
Tue Apr 9 18:34:41 CEST 2013


Hi Scott,

On 04/08/2013 11:04 AM, Scott Schwartz wrote:
> Hi Herve -- You might not be who I need to ask about this, but, the
> stringDist (biostrings) function computes a matrix of all pairwise
> distances.  Is there any way to have it just return the first row of
> this matrix to save computation time?

There is indeed room for improving the facilities for computing string
distances in Biostrings. This won't happen in the next couple of weeks
though...

In the meantime, if you need the Hamming distance (i.e. nb of mismatches
only, this implies that all your strings have the same length), you
can use neditAt():

   > neditAt("ACT", DNAStringSet(c("AAA", "ACT", "ATT")))
        [,1] [,2] [,3]
   [1,]    2    0    1

The above corresponds to the first col of the matrix below:

   > stringDist(c("ACT", "AAA", "ACT", "ATT"), method="hamming")
     1 2 3
   2 2
   3 0 2
   4 1 2 1

If you need the Levenshtein distance, you could use the stringdist
package from CRAN:

   > library(stringdist)
   > stringdist("ACT", c("AAA", "CT", "ACTT"))
   [1] 2 1 1

This corresponds to the first col of the matrix below:

   > stringDist(c("ACT", "AAA", "CT", "ACTT"))
     1 2 3
   2 2
   3 1 3
   4 1 3 2

Cheers,
H.

>
> Thanks in advance, and apologies if this is an unwelcome request.
>
> Best,
> Scott
>
>
> Scott Schwartz, PhD
> Statistical Geneticist and Bioinformatics Scientist,
> Genomics and Bioinformatics
> Texas AgriLife Research
> Texas A&M System
> Rm 175 - Norman E. Borlaug Center
> College Station, TX 77843-2123
>
> Email: sschwartz at ag.tamu.edu
> Office: (979) 845-1068
> Cell: (210) 296-4392
> Website: http://www.txgen.tamu.edu
> <https://agrilifepeople.tamu.edu/index.cfm/event/publicDirectory/WhichEntity/3/whichunit/425/>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list