[R] visualizing nucleotide sequence properties

Jean lobry lobry at biomserv.univ-lyon1.fr
Wed Nov 28 21:08:23 CET 2007


>Hi there,
>
>I am looking for R-packages that can help me visualize properties on
>nucleotide sequences. I want to display sequences in the 1-100K base range
>as lines and plot features above and below those lines.
>
>Any ideas would be welcome.
>
>Thanks,
>
>Bernd

Hi Bernd,

not sure to understand what you mean by "properties". Let's say the
property you are interested in is the GC-skew. You can try something
along these lines:

#
# Load DNA sequence data:
#
library(seqinr)
myseq <- read.fasta(as.string = TRUE)
#
# Define a function to compute the GC skew:
#
gcskew <- function(x) {
   if (!is.character(x) || length(x) > 1)
   stop("single string expected")
   tmp <- tolower(s2c(x))
   nC <- sum(tmp == "c")
   nG <- sum(tmp == "g")
   if (nC + nG == 0) return(NA)
   return(100 * (nC - nG)/(nC + nG))
}
#
# Moving window along the sequence:
#
step <- 10000
wsize <- 10000
starts <- seq(from = 1, to = nchar(myseq), by = step)
starts <- starts[-length(starts)]
n <- length(starts)
result <- numeric(n)
for (i in seq_len(n)) {
   result[i] <- gcskew(substr(myseq, starts[i], starts[i] + wsize - 1))
}
#
# Plot the result:
#
xx <- starts/1000 # in Kbp
yy <- result
n <- length(result)
hline <- 0
plot(yy ~ xx, type = "n", axes = FALSE, ann = FALSE)
polygon(c(xx[1], xx, xx[n]), c(min(yy), yy, min(yy)),
   col = "black", border = NA)
usr <- par("usr")
rect(usr[1], usr[3], usr[2], hline, col = "white", border = NA)
lines(xx, yy)
abline(h = hline)
box()
axis(1, at = seq(0, 1000, by = 200))
axis(2, las = 1)
title(xlab = "position (Kbp)", ylab = "(C-G)/(C+G) [percent]",
   main = expression(paste("GC skew in ", italic(Chlamydia~trachomatis))))
arrows(800, 5.5, 720, 0.5, length = 0.1, lwd = 2)
text(800, 5.5, "origin of\nreplication", pos = 4)

HTH,

Jean
-- 
Jean R. Lobry            (lobry at biomserv.univ-lyon1.fr)
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
allo  : +33 472 43 27 56     fax    : +33 472 43 13 88
http://pbil.univ-lyon1.fr/members/lobry/



More information about the R-help mailing list