[BioC] strategy to match/align peptide sequence to protein

Hervé Pagès hpages at fhcrc.org
Wed Jan 9 02:31:16 CET 2013


Hi Juliet,

Yes matchPattern() should work. Did you run into any problems?

However note that by default, matchPattern() will do exact matching.
If you want to allow for some mismatches and/or indels, you can use
the 'max.mismatch' and 'with.indels' args. See ?matchPattern for
the details.

And if you want to use the full power of the Smith-Waterman algo,
you can use pairwiseAlignment(), which lets you do global, global-local,
and local-local alignments, with the substitution matrix and gap
penalties of your choice. See ?pairwiseAlignment for the details.
There is also a full vignette (PairwiseAlignments) dedicated to this
in the Biostrings package.

I could try to help more if you had more specific questions.

Cheers,
H.


On 01/04/2013 07:20 AM, Juliet Hannah wrote:
> All,
>
> Given a list of small peptide sequences and swissprot identifiers, I
> would like to find out where the
> peptide aligns to the full protein.
>
> The script I am using is below. I am seeking any comments on the
> strategy (are there alternatives,
> is there a better way to align...etc).
>
> Thanks,
>
> Juliet
>
> # given "HEMO_HUMAN"
> # get sequence from biomart
>
> library("biomaRt")
> mart <- useMart("ensembl",dataset="hsapiens_gene_ensembl")
> seq = getSequence(id="HEMO_HUMAN", type="uniprot_swissprot",
> seqType="peptide", mart = mart)
> show(seq)
>
> library(Biostrings)
>
> # find out where short sequence toFind falls along full protein
>
> toFind <- "ARVLGA"
> matchPattern(toFind,seq$peptide)
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list