[BioC] CRISPRseek: searchHits() optimization

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Sun Aug 3 18:41:15 CEST 2014


Thanks so much! Yes, with a sequence containing 454 gRNAs, it took 34
minutes to perform gRNA search, restriction enzyme and paired configuration
annotation, and off-target search and score prediction in human genome. It
is a huge increase in speed gain (> 10x) !

I notice that genome-wide search includes searching in contigs. I recall
that someone in Bioc2014 mentioned a function that can return a main set of
chromosomes for a given BSgenome, but I do not remember the function any
more. Do you or anyone in the list knows the function? Many thanks!

Best regards,


On 8/3/14 7:54 AM, "Lihua Julie Zhu" <julie.zhu at umassmed.edu> wrote:

> Herve,
> Wow! Thanks so much for improving the code so quickly!
> I will play with it today.
> Best regards,
> Julie
> On 8/3/14 4:46 AM, "Hervé Pagès" <hpages at fhcrc.org> wrote:
>> Hi Julie,
>> I looked at the searchHits() function and found a way to optimize it.
>> The trick is to use matchPDict() internally instead of matchPattern()
>> and to preprocess the set of gRNAs. This allows a 2x speedup for 50
>> 23-base gRNAs with max.mismatch=4. The speedup will be more drastic
>> if there are more gRNAs or if they are longer. For example, with
>> hundreds of 23-base gRNAs, you will probably see a 4x speedup and
>> even more if the gRNAs are longer.
>> Note that preprocessing is not always possible e.g. if the gRNAs
>> are very short, or if max.mismatch is too high, or if the gRNAs
>> contain IUPAC ambiguity codes. In that case, the code will skip
>> the preprocessing step and you won't see any speedup.
>> I committed the change to the devel version of CRISPRseek and bumped
>> the version to 1.1.8. Try it with a big set of gRNAs and let me
>> know how it goes. If you use max.mismatch=4, the longer the gRNAs
>> are, the faster it's going to be. Let me know if you run into any
>> problem.
>> I enjoyed the conference. It was nice to see you a again.
>> Hope you had a safe trip back home.
>> Best,
>> H.

More information about the Bioconductor mailing list