[BioC] PWMscoreStartingAt and ambiguous subject seqs

Hervé Pagès hpages at fhcrc.org
Tue May 22 07:33:01 CEST 2012


Hi Janet,

On 05/21/2012 06:34 PM, Janet Young wrote:
> Hi there,
>
> I'm using PWMscoreStartingAt from Biostrings - it's VERY useful for me - thanks!
>
> Some of the sequences I'm scanning include ambiguities (some N, some Y, etc - uses IUPAC codes).  I'm really glad that PWMscoreStartingAt works on these sequences, but I'd like to understand how scores are calculated when an N (or whatever) is present - would it be easy for you to add that to the documentation?  (or just an email response would be fine too, but seems useful to add it to the docs)

It seems that IUPAC ambiguity codes are simply ignored in the
calculation of the score. For example with the 'pwm' used in the
man page for PWMscoreStartingAt():

 > dim(pwm)
[1]  4 13
 > PWMscoreStartingAt(pwm, DNAString("AAAAAAAAAAAAA"))
[1] 0.4960267
 > sum(pwm["A", ])
[1] 0.4960267
 > PWMscoreStartingAt(pwm, DNAString("NNNNNNNNNNNNN"))
[1] 0

This is probably not very satisfying. Maybe the contribution of an
ambiguity to the score should be the average of the contributions
of the individual bases represented by the ambiguity? I could implement
this if that sounds reasonable. Feedback on this is welcome, and, in
particular, it would be good to know how other tools handle this.

Thanks!
H.

>
> thanks very much,
>
> Janet
>
>
> -------------------------------------------------------------------
>
> Dr. Janet Young
>
> Tapscott and Malik labs
>
> Fred Hutchinson Cancer Research Center
> 1100 Fairview Avenue N., C3-168,
> P.O. Box 19024, Seattle, WA 98109-1024, USA.
>
> tel: (206) 667 1471 fax: (206) 667 6524
> email: jayoung  ...at...  fhcrc.org
>
>
> -------------------------------------------------------------------
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list