[BioC] vmatchPDict?

Hervé Pagès hpages at fhcrc.org
Fri Dec 14 20:01:36 CET 2012


Hi David,

On 12/14/2012 03:45 AM, David Iles wrote:
> Hi,
>
> I need to re-map the probe sequences of the Affymetrix Bovine genome array to a recent draft sequence of the sheep genome (please, don't ask why...). As a first step, I successfully created a new BSgenome package from a seed file, listing individual chromosomes as 'seqnames' and unmapped, and two multiple sequence fasta files as 'mseqnames',  as per the forgeBSgenomeDataPkg vignette (see session info below).
>
> When calling the matchPDict() function to map the probe sequences to the + and - strands of individual chromosomes, all went smoothly, but the following error occurred with multiple sequences:
>
>> runAnConScaff(bt.probes.all, outfile="bt.probes.2.oarv3.1.unmapped.txt")
>
> Target: strand + of Oar v3.1 sequence unmapped_scaffolds, unmapped_contigs
>>>> Finding all hits in strand + of sequence unmapped_scaffolds ...
> Error in matchPDict(pdict, subject) :
>    please use vmatchPDict() when 'subject' is an XStringSet object (multiple sequence)
>
> So, I edited my script to call vmatchPDict() instead, with the following result....
>
>> runAnConScaff(bt.probes.all, outfile="bt.probes.2.oarv3.1.unmapped.txt")
>
> Target: strand + of Oar v3.1 sequence unmapped_scaffolds, unmapped_contigs
>>>> Finding all hits in strand + of sequence unmapped_scaffolds ...
> Error in .local(pdict, subject, max.mismatch, min.mismatch, with.indels,  :
>    vmatchPDict() is not ready yet, sorry
>
> While I can work around this by splitting the multiple sequences into loads of small fasta files, each with a single sequence, I wondered, will the vmatchPDict() function be ready in the not-too-distant future?

Sure. If I remember correctly, I delayed this because (1) it required
spending a little bit of time thinking about what kind of container
would be most appropriate for storing the result of vmatchPDict()
(conceptually something like a list of lists of IRanges objects,
or a 2-D ragged array of IRanges objects, or...), and (2) I don't think
anybody asked for this before.

In the meantime the workaround of course, as you figured it out, is to
call matchPDict() in a loop. FWIW tvcountPDict() and vwhichPDict() are
implemented.

Cheers,
H.

>
> Many thanks
>
> Dr David Iles
> School of Biology
> University of Leeds
> Leeds LS2 9JT
>
> d.e.iles at leeds.ac.uk
>
>> sessionInfo()
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] BSgenome.Oaries.ISGC.Oarv3.1 BSgenome_1.26.1                  Biostrings_2.26.2
> [4] GenomicRanges_1.10.5             IRanges_1.16.4                   BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] parallel_2.15.2 stats4_2.15.2   tools_2.15.2
>>
>
>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list