[BioC] Problem with BioStrings: patternMatch ALSO BLAST
hpages at fhcrc.org
hpages at fhcrc.org
Thu Jun 7 08:04:32 CEST 2007
Quoting Paul Leo <p.leo at uq.edu.au>:
> These commands are straight from the Biostrings; PatternMatch manual.
> Could not find any help online.
> Any advice on the key 86 error??
See below for the explanation.
> Otherwise has anyone do genomic BLAST from within R. Any advice on the
> best path.... Use Blastcl3 ? ,another standalone blast , ?
> Maybe a 1000 or do short sequences to blast....
There is no BLAST in Biostrings but there is the needwunsQS() function
for _global_ alignment of 2 sequences. I just rewrote this function in C
in Biostrings devel (2.5.9) to make it faster. Now it's possible to align
two 20000-letter sequences (nucleotides or amino acid) in 2 or 3 seconds,
granted that you have enough memory for this.
Look at the Alignment.pdf vignette, the man page for needwunsQS()
(?needwunsQS) and at ?scoring_matrices for more information on this.
> Thanks in advance
> > cI <- Mmusculus$chr13
> > length(cI)
>  120614378
> > class(cI)
>  "DNAString"
>  "Biostrings"
> > p <- "UUACAGUUGUUCAACCAGUUACU"
> *********** ERROR****************
> > matchPattern(p, cI, mismatch = 0)
> Error in CharBuffer.write(data, 1, length, value = src, enc = lkup) :
> key 85 not in lookup table
You are applying matchPattern() on a DNAString subject (cI) so the pattern (p)
is expected to be a DNAString object too or a string that can be converted
into a DNAString object. In your example p contains the letter U (looks like
an RNA sequence) so it can't be converted to a DNAString object.
key 85 is the ascii code of the U letter: this code does not belong to
the lookup table used internally to encode the letters of a DNAString object
(letters stored in a DNAString or RNAString objects are encoded, this is a
trick to allow some fast searching algorithms).
I agree that the current error message is not really helpful, sorry! I plan
to work on this and to improve the overall documentation too.
Your pattern (p) can be converted into a RNAString object first with
rnap <- RNAString(p)
and then into a DNAString object with:
dnap <- DNAString(rnap)
This will not necessarily give you what you want since when converting
from RNAString to DNAString (or inversely) the current translation is
U <-> A
G <-> C
C <-> G
A <-> T
(which mimics the transcription/reverse transcription processus) so
you might want to apply complement() (or reverseComplement()) to
> > sessionInfo()
> R version 2.5.0 (2007-04-23)
> attached base packages:
>  "tools" "stats" "graphics" "grDevices" "utils"
>  "methods" "base"
> other attached packages:
> BSgenome.Mmusculus.UCSC.mm8 BSgenome
> "1.2.0" "1.4.0"
> Biobase Biostrings
> "1.14.0" "2.4.5"
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives:
More information about the Bioconductor