[R] Calculating a table of symbol frequencies

Wiener, Matthew matthew_wiener at merck.com
Thu Jan 6 20:39:18 CET 2005


Kurt -- 

If you create a vector  of alignment positions, you should be able to do 

alignment.pos <- rep(1:236, each = 72)
table(data.frame(as.vector(align1), alignment.pos))

You may want to coerce align1 to a factor with appropriate levels, in case
you are missing some amino acids.  Otherwise there's an automatic coercion,
I believe, and you will use only the levels actually present in your data.

Here's an example using the first 10 letters of the alphabet instead of the
amino acid set:

> align1 <- matrix(sample(LETTERS[1:10], 200, replace = TRUE), nr = 5, nc =
40) 
> alignment.pos <- rep(1:40, each = 5)
> table(data.frame(as.vector(align1), alignment.pos))

Hope this helps,

Matt Wiener

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Wollenberg, Kurt R
Sent: Thursday, January 06, 2005 11:36 AM
To: 'r-help at stat.math.ethz.ch'
Subject: [R] Calculating a table of symbol frequencies


Hello all:

I have a protein sequence alignment in a data frame (align1, 72 x 236),
where each row is a protein and each column a site in the alignment. AA is
vector of amino acid symbols plus "-" (gap). I can calculate amino acid
frequencies at each site by:

>align1.F <- matrix(0,nrow=22,ncol=236,dimnames=list(AA,seq(1:236)))
>for(i in 1:236)
> align1.F[names(summary(align1[[i]])),i] <-
(summary(align1[[i]])/length(align1[[i]]))

Is there a more efficient (i.e., without a loop) way to do this? Is there
some way to use table or ftable to create an 22 x 236 table of amino acid
frequencies from align1 and AA in one fell swoop?

Thanks,
Kurt Wollenberg, PhD
Tufts Center for Vision Research 
New England Medical Center
750 Washington St, Box 450 
Boston, MA, USA
kwollenberg at tufts-nemc.org 
617-636-8945 (Fax)
617-636-9028 (Lab)

The most exciting phrase to hear in science, the one that heralds new
discoveries, is not "Eureka!" (I found it!) but  "That's funny ..." 
--Isaac Asimov


********************** 
Confidentiality Notice\ **********************\      The inf...{{dropped}}

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html




More information about the R-help mailing list