[R] motif search

Thu Dec 11 17:37:40 CET 2008

Dear Alessia,

>  I am very new to R and wanted to know if there is a package that, given very
>  long nucleotide sequences, searches and identifies short (7-10nt) motifs..  I
>  would like to look for enrichment of certain motifs in genomic sequences.
>
>  I tried using MEME (not an R package, I know), but the online version only
>  allows sequences up to MAX 60000 nucleotides, and that's too short for my
>  needs..

You may try this:

#
# Load the seqinr package:
#
   library(seqinr)
#
# A FASTA file example - that ships with seqinr - which contains
# the complete genome sequence of Chlamydia trachomatis :
#
   fastafile <- system.file("sequences/ct.fasta", package = "seqinr")
#
# Import the sequence as a string of characters:
#
   myseq <- read.fasta(fastafile, as.string = TRUE)
   nchar(myseq) # 1042519, that is a Mb sequence
#
# Look for motif "atatatat", with possible overlap:
#
   words.pos("atatatat", myseq, extended = TRUE)
#
# This returns the posistions where the motif is found, that
# is : 236501 236503 283987 687083 792792 792794
#
   substr(myseq, 236501, 236501 + 8)
#
# Should be
# [1] "atatatata"
#

HTH,

Jean
-- 
Jean R. Lobry            (lobry at biomserv.univ-lyon1.fr)
Laboratoire BBE-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
allo  : +33 472 43 27 56     fax    : +33 472 43 13 88
http://pbil.univ-lyon1.fr/members/lobry/