[BioC] kyte and doolittle hydropathy values/plot - sub + sliding window mean

Matthew Hannah Hannah at mpimp-golm.mpg.de
Thu Dec 2 18:40:03 CET 2004


Hi,

I've found one mistake, I didn't realise that strsplit returns a list
within a list, so unlist(x) allows me to crudely call the score.assign
function below. But it must be possible to replace them efficiently all
at once?

I wondered about the names function but don't see how to assign the
values assigned to names in one vector to characters matching the names
in another vector.

Thanks in advance,
Matt


>>>>>

Hi,

This algorithm calculates the hydropathy of proteins. I've found
web-based versions but they all return a graph not values. I was
wondering if there was an R/BioC inplementation of it, or something
similar.

Alternatively I'm trying to do something similar myself but have got
stuck with no obvious help in archives.

My protein sequences will be read in as fasta strings and converted to a
character vector.
x <- "MSETNKNAFQ"
strsplit(x,"")

I have the scores for the 20 amino acids (letters in column 2 of a
table), and the scores from -4.5 to 4.5 in another column. I want to
replace the letters with the corresponding score.

I've tried using sub and gsub, but can't work how to replace them all at
one. But doing them individually
score.assign <- function(x) {
x <- gsub(scores[1,2],scores[1,3],x)
x <- gsub(scores[2,2],scores[2,3],x)
...
}

returns this
"c(\"4.2\", \"-0.4\", \"-4.5\", \"4.5\")"
which I can't work out how to convert to a usable vector.

Once I have my numeric vector I want to calculate a sliding (hopefully
using different window sizes) mean of AAs 1:12, 2:13..etc.

Finally, this would be best if I could import a large number of
sequences from fasta format to analyse at once. I could not see any
obvious way of handling sequence data easily in BioC, have I just missed
something.

Thanks alot,

Matt



More information about the Bioconductor mailing list