[BioC] suggested Biostrings library function for printing an alignment

Coghlan, Avril A.Coghlan at ucc.ie
Fri Nov 20 16:15:08 CET 2009


Dear Bioconductor users and developers,

I am using the Biostrings library for creating alignments, especially
the pairwiseAlignment() function, which is very useful.
I would like to suggest a new function for the Biostrings library, for
printing out a pairwise alignment in blocks.

I have looked, and there doesn't seem to be such a function at present
in Biostrings or any other R library (I have tried to find one).

Here is my suggested function:

> printPairwiseAlignment <- function(alignment, chunksize)
{
     seq1aln <- pattern(alignment) # Get the alignment for the first
sequence
     seq2aln <- subject(alignment) # Get the alignment for the second
sequence
     alnlen  <- nchar(seq1aln)     # Find the number of columns in the
alignment
     starts  <- seq(1, alnlen, by=chunksize)
     n       <- length(starts)     
     seq1alnresidues <- 0
     seq2alnresidues <- 0
     for (i in 1:n) {
        chunkseq1aln <- substring(seq1aln, starts[i],
starts[i]+chunksize-1)
	 chunkseq2aln <- substring(seq2aln, starts[i],
starts[i]+chunksize-1)
        # Find out how many gaps there are in chunkseq1aln:
        gaps1 <- countPattern("-",chunkseq1aln) # countPattern() is from
Biostrings library
        # Find out how many gaps there have been on this line of
subject3:
        gaps2 <- countPattern("-",chunkseq2aln) # countPattern() is from
Biostrings library
        # Calculate how many residues of the first sequence we have
printed so far in the alignment:
	 seq1alnresidues <- seq1alnresidues + chunksize - gaps1
        # Calculate how many residues of the second sequence we have
printed so far in the alignment:
	 seq2alnresidues <- seq2alnresidues + chunksize - gaps2
        print(paste(chunkseq1aln,seq1alnresidues))
        print(paste(chunkseq2aln,seq2alnresidues))
        print(paste(' '))
     }
}

You can then use the Biostrings library to create an alignment of two
sequences, and print it out in blocks of length 'chunksize'.
For example, here is an example, for printing out an alignment in blocks
of length 30 residues:
> s1 <- "MSHPALTQLRALRYCKEIPALDPQLLDWLLLEDSMTKRFGKTVSVTMIREGFVEQNE"
> s2 <- "MSHPALTQLRALRYFDAIPALEPHLLDWLLLEDSVTKRFEQQGKRVSVTLIREAFVGQSE"
> aln <- pairwiseAlignment(s1,s2,substitutionMatrix="BLOSUM50")
> printPairwiseAlignment(aln,20)
[1] "MSHPALTQLRALRYCKEIPALDPQLLDWLL 30"
[1] "MSHPALTQLRALRYFDAIPALEPHLLDWLL 30"
[1] " "
[1] "LEDSMTKRF---GKTVSVTMIREGFVEQNE 57"
[1] "LEDSVTKRFEQQGKRVSVTLIREAFVGQSE 60"
[1] " "

Does this seem a good idea to you all?

I am relatively new to R so I am sure that somebody else could improve
my function to make it nicer.
If you think it's a good idea to make such a function, please could
somebody add it to the Biostrings library (I don't know how to do this)?

Regards,
Avril

Avril Coghlan
University College Cork
Ireland



More information about the Bioconductor mailing list