[BioC] alternative to matchprobes

Hervé Pagès hpages at fhcrc.org
Wed May 18 05:18:38 CEST 2011


Hi Katja,

On 11-05-14 06:58 AM, kloytyno at mappi.helsinki.fi wrote:
>
> Dear all,
>
> I've found out that matchprobes has been deprecated and the function in
> question has been moved to Biostrings package, that also has been at
> least partially deprecated. Biostrings manual suggested functions
> matchPDict and vmatchPDict, and other pattern matching functions I found
> were grep-family of functions and pmatch. I'd love to hear your
> suggestions and information on what to use instead of matchprobes. Any
> help is much appreciated.

Note that the matchprobes() function is not deprecated at the moment
(but might be in the near future).

What to use exactly depends on what you want to do. More precisely:

   1. What kind of sequences you have: DNA, RNA, other?

   2. How many do you have?

      (a) Just a few short patterns and a few long subjects.
          More precisely you have just a few short strings that you
          want to match against just a few long strings.

      (b) A lot (millions) of short patterns and just a few
          long subjects.

      (c) Just a few short patterns that you want to match
          against a lot (millions) of short subjects.

      (d) A lot (millions) of very short patterns that you
          want to match against a lot of very short subjects.

   3. Do all your patterns have the same length?

   4. What kind of matching you want to perform: exact? or with
      mismatches? or maybe also with indels?

   5. If your sequences are DNA or RNA, do they contain IUPAC
      ambiguity letters? In the patterns? in the subjects? In both?
      And if so, do you want to handle them as ambiguities?

   6. What information you want to be returned:

      (a) the locations of all the matches

      (b) just the number of matches

      (c) only which patterns have matches

I won't draw the decision tree that you could follow based on your
answers to all these questions (because I don't have such tree yet,
but it's something I need to add to the Biostrings doc, has been
on my TODO list for a long time), but if you can provide the answers
here I will try to direct you to the right function to use.

Cheers,
H.


>
> best,
> Katja
>
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list