[BioC] [biocpkgs] suggestions on package matchprobes

lgautier at altern.org lgautier at altern.org
Fri Sep 22 12:36:52 CEST 2006


> Thanks a lot!
>
> Maybe you can remind users that the matching is case-sensitive. For DNA
sequences, people might tend to treat the lowercase and the uppercase
the same.

> I was looking at the library "altcdfenvs" to create an alternative CDF
environment. I do not see a straightforward connection between
> "altcdfenvs" and "Biostrings". Any suggestions?

You raise a good point here. I would have liked to use something generic
for DNA/RNA sequence but "Biostrings" was in its infancy when "altcdfenvs"
was written... and now because of my current occupation further work on
the package is unlikely to happen soon..

Low-energy approaches would be to either write a function that transforms
a a list of 'BStringViews' to a list such as the one returned by
'matchprobes' and feed it to 'buildCdfEnv.matchprobes' (as the vignette
and documentation for 'buildCdfEnv.matchprobes' indicate), or to modify
'buildCdfEnv.matchprobes' to accept a list of 'BStringViews' as input
(and in that last case you will mostly only have to work on the following
code:
    xy <- getxy.probeseq(probeseq=probe.tab, i.row=matches$match[[i]],
                         x.colname = x.colname, y.colname = y.colname)
)


Hoping this helps,


Laurent


> BTW, my previous reply was held because this email address was not
subscribed to this list. Now it should work.
>
> Best,
> Xinxia
>
> -----Original Message-----
> From: Wolfgang Huber [mailto:huber at ebi.ac.uk]
> Sent: Thursday, September 14, 2006 2:31 AM
> To: Robert Gentleman
> Cc: Xinxia Peng; Bioconductor
> Subject: Re: [BioC] [biocpkgs] suggestions on package matchprobes
>
> Hi Xinxia,
>
> thanks!
> 1. The problem with the cases was simple: the function 'matchprobes'
calls C code to do the actual work, and it was:
>
>   matchprobes <- function(query, records, probepos=FALSE)
>      .Call("MP_matchprobes", toupper(query), records, probepos,
>          PACKAGE="matchprobes")
>
> I removed the "toupper" in matchprobes_1.5.1, this should make you
happier. There is no good reason why it should have been there, and that
it was not documented was a bug. So now it is gone.
>
> 2. As Robert said, for generic sequence matching please use
> "Biostrings", that is much better. "matchprobes" only still exists for
backward compatibility.
>
>  Best wishes
>  Wolfgang
>
>
> Robert Gentleman wrote:
>> Please ask these sorts of questions on the Bioconductor mailing list -
>
>> redirected there
>> and for generic sequence matching Biostrings is a better tool - we will
look into this, thanks Robert
>> Xinxia Peng wrote:
>>> =+=+=+=+=+=+=+=+=+ biocpkgs mailing list +=+=+=+=+=+=+=+=+= Dear Bioc
>
>>> Team,
>>> It appears that the function 'matchprobes' will not work with
> sequences in lower case. Also it might be nice not to match empty
string. See the following example:
>>>> test.seq
>>>  [1]
> "atggcggcgcaaagtagtggtgggggtggaggttgtggtgaggaagataaagatgccaaatatatgtttga
taggatagggaaagaagtgcacgacgaag"
>>>  [2]
> "atgaaaagggtaatgcaacaatttgtggatcgtacaacacaacgatttcacgaatatgatgaaaggatgaa
aactacacgccaaaaatgtaaagaacgat"
>>>  [3]
> "atgaaacttcactgctctaaaatattattatttttacttccattaaatatattagtaacatcattatcaaa
tgtgcataataataataaactatacaaca"
>>>  [4]
> "atgaaagtccattatattaatatattattgtttgctcttccattaaatatattggaacataataaaaatga
accacacaccacaccaaatcatacacaaa"
>>>  [5]
> "atgtttacaacaaaaaaaaaaattaaatatattataattatatgtggcatctttcgaaaatatttcaaatt
cggaagaattattgaggttccaatgatgc"
>>>  [6]
> "atgaaactgcactactctaatatattattatttttctttccattaaatatattagtaacatcatatcatgt
atataataaaaataaaatatacatcacac"
>>>  [7]
> "atgtgtgctattggagaattactatcatctacagataaggaatatactcttaatttctttggtttagttaa
agatggagcatcgattgatgaaatgaaag"
>>>  [8]
> "atgattaagatgaaattccattatgtaggatattattctgaagaagaaaatatgaaaaatacactgaaaat
ttgttccgttagacaaatatttttaaatt"
>>>  [9]
> "atgttattatttgctttattatttaatgcacttttattatcacaaaatgtaaattgccgaaacaacaatta
taatataagattcactcaaacgataacac"
>>> [10]
> "atgatataccacagaaggattatagcttatctcataaatcatctaccattaggtatatcccttacagaagt
ggtcgatataaatgaagaacatatattta"
>>>> test.p
>>> [1] "atggcggcgcaaagtagtggtgggg"
>>>> matchprobes(test.seq, test.p)
>>> $match
>>> $match[[1]]
>>> numeric(0)
>>> $match[[2]]
>>> numeric(0)
>>> $match[[3]]
>>> numeric(0)
>>> $match[[4]]
>>> numeric(0)
>>> $match[[5]]
>>> numeric(0)
>>> $match[[6]]
>>> numeric(0)
>>> $match[[7]]
>>> numeric(0)
>>> $match[[8]]
>>> numeric(0)
>>> $match[[9]]
>>> numeric(0)
>>> $match[[10]]
>>> numeric(0)
>>>> matchprobes(toupper(test.seq), toupper(c(test.p, "")))
>>> $match
>>> $match[[1]]
>>> [1] 1 2
>>> $match[[2]]
>>> [1] 2
>>> $match[[3]]
>>> [1] 2
>>> $match[[4]]
>>> [1] 2
>>> $match[[5]]
>>> [1] 2
>>> $match[[6]]
>>> [1] 2
>>> $match[[7]]
>>> [1] 2
>>> $match[[8]]
>>> [1] 2
>>> $match[[9]]
>>> [1] 2
>>> $match[[10]]
>>> [1] 2
>>> Thanks,
>>> Xinxia Peng
>>> Seattle Biomedical Research Institute
>>> __________________________________________________________________
biocpkgs mailing list
>>> To unsubscribe from this mailing list send a blank email to
>>> biocpkgs-leave at lists.fhcrc.org You can also unsubscribe or change your
personal options at
>>> http://lists.fhcrc.org/mailman/listinfo/biocpkgs
>
>
> --
> ------------------------------------------------------------------
Wolfgang Huber  EBI/EMBL  Cambridge UK  http://www.ebi.ac.uk/huber
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list