[BioC] difference results by needle program and pairwiseAlignment

Hervé Pagès hpages at fhcrc.org
Thu Dec 6 23:06:49 CET 2012


Hi,

On 12/06/2012 01:15 PM, Wang Peter wrote:
> i used EMBOSS package needle program got the score 6
>
> pairwiseAlignment(seq1,seq2, type = "global", substitutionMatrix =
> "BLOSUM62", gapOpening = -10, gapExtension = -0.5)
>
> both use the same parameter
>
> # Matrix: EBLOSUM62
> # Gap_penalty: -10.0
> # Extend_penalty: -0.5

2 things:

1. pairwiseAlignment() interprets gapOpening/gapExtension
    not really the standard way (which is kind of unfortunate).
    See ?writePairwiseAlignments for how to translate between
    gapOpening/gapExtension and Gap_penalty/Extend_penalty.

    So to effectively achieve EMBOSS Gap_penalty and Extend_penalty
    of -10 and -0.5, respectively, you would need to set 'gapOpening'
    and 'gapExtension' to 9.5 and 0.5, respectively.

2. However, another kind of strange feature of pairwiseAlignment()
    (that I was not aware of until now), is that it will only accept
    non-positive values for gapOpening and gapExtension. More precisely,
    you can pass positive values, but, in that case, the opposite values
    will effectively (and silently) be used and stored in the returned
    object:

      pa <- pairwiseAlignment(seq1, seq2, type="global",
                              substitutionMatrix="BLOSUM62",
                              gapOpening=9.5, gapExtension=0.5)
      > pa at gapOpening
     [1] -9.5
     > pa at gapExtension
     [1] -0.5

    So there is no way you can effectively set the 'gapOpening' and
    'gapExtension' args to achieve EMBOSS Gap_penalty and Extend_penalty
    to -10 and -0.5.

BUT THE MOST IMPORTANT THING IS THIS: Gap_penalty and Extend_penalty
are really aimed to be positive! By setting them to negative values,
you encourage gaps i.e. the more gaps the better the final score will
be. That means you can align pretty much anything with anything. With
your settings, even sequences that have almost nothing in common
(like your 2 sequences) get a positive score!

I guess this is why the original author of pairwiseAlignment() didn't
want to allow positive gapOpening and/or gapExtension values. Maybe
we should allow them though, just so that people can experiment and
compare results with other tools. But if not, pairwiseAlignment()
should raise an error or at least issue a warning that the opposite
values are effectively used. What do people think?

Thanks,
H.


> 		
> sequences are
>> AAG19119.1
> mgitkqvsirsdnfipghrvfderpHILHvcdikvnpednmwefeyqkekdeissrhtkdvylymeldekqhskltsffl
> eekprdssHSFHvdisgdiiqqvtkrpgagtsdkeknkytpspirfealsdrvvietfagdnpiipyyevthhdtpqdrh
> nvsriesfiqsmqdggstpgleplptelvselqsaewgeevrHHLHegdecyrnsllhpalssyihaiewalisflkeke
> dvdiiqqekngdlyyfasgnsilgevqdtgelsqksisrikslnraerrwmghhksgeatkeeldgmrarltqileelfg
> tdsark
>> AAG19157.1
> msvaddstddHGHHlpavedwpkgfgeaswwpfitaigaagfyigaalyvlgrgesalvgpmvgpgvfiastfaflagly
> gwvyhafvaaywsndgggstalrwgmigflgseiatfgagfayyffiragtqwstaasaipdgflgslvvvntailvvss
> ftlHFAHvalrkgnrsrflallvstlvlgvvfiggqvyeyyefiahegftlsgglfesaffgltglHGLHvtmgavligi
> imvrglrgqysadrhtsvstvsmywhfvdivwiflvvvlyagsva
>
> sionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936
> [2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936
> [3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
> [4] LC_NUMERIC=C
> [5] LC_TIME=Chinese (Simplified)_People's Republic of China.936
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] ShortRead_1.14.4     latticeExtra_0.6-24  RColorBrewer_1.0-5
> [4] Rsamtools_1.8.6      lattice_0.20-6       Biostrings_2.24.1
> [7] GenomicRanges_1.8.13 IRanges_1.14.4       BiocGenerics_0.2.0
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.16.0 bitops_1.0-5   grid_2.15.1    hwriter_1.3
> [5] stats4_2.15.1  tools_2.15.1   zlibbioc_1.2.0
>
> thank you very much
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list