[BioC] Probe matching using vmatchPDict

Hervé Pagès hpages at fhcrc.org
Mon Jul 25 19:57:04 CEST 2011


Hi Ian,

On 11-07-25 08:41 AM, Ian Henry wrote:
> Hello,
>
> I'm trying to match a list of 60mer probes against a transcriptome to
> see which probes hit which transcripts.
>
> I have my 60mer probe list as a DNAStringSet and also as a "PDict"
>  > probeset <- DNAStringSet(probelist$ProbeSeq)
>  > probeset_pdict <- PDict(probeset)
>
> My transcriptome was created as follows:
>  > zv9txdb <- makeTranscriptDbFromUCSC(genome = "danRer7", tablename =
> "ensGene")
>  > zv9_tx <- extractTranscriptsFromGenome(Drerio, zv9txdb)
>
>
> To find which transcripts are hit by the probes I've used vwhichPDict:
>  > tx_matches <- vwhichPDict(probeset_pdict, zv9_tx)
> which works brilliantly!
>
> However, I also would like the locations of the matches and so tried:
>  > tx_locs <- vmatchPDict(probeset_pdict, zv9_tx)
> This doesn't work and errors to say:
> Error in .local(pdict, subject, max.mismatch, min.mismatch, with.indels, :
> vmatchPDict() is not ready yet, sorry
>
> Does this just mean it's not yet implemented and is there a
> solution/workaround?

It is not yet implemented and has been on my TODO list for a long time,
sorry...

One workaround I can think of: use matchPDict in a loop (you loop over
the transcripts). The Drerio transcriptome is not be that big so it
might run in descent time. It might help a little bit to reduce the
size of the problem by getting rid of the probes and transcripts that
don't have hit (based on the result of vwhichPDict).

Let me know if that's not fast enough or if you need further help with
this.

Cheers,
H.

>
> vmatchPDict(probeset_pdict, Drerio) works but I'd really like to match
> to the transcriptome rather than the genome.
>
> Thanks for any advice in advance,
>
> Ian
>
> Ian Henry
> MPI-CBG Dresden
> Pfotenhauerstrasse 108
> 01307 Dresden
> Germany
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list