[BioC] problem importing a fasta file biostrings or seqinr ?

Steve Lianoglou mailinglist.honeypot at gmail.com
Mon Nov 9 16:45:47 CET 2009


Hi Moses,

On Nov 9, 2009, at 7:21 AM, m a wrote:

> Hello,
>
> I would like to make simple statistics on a specific DNA sequence.  
> In order
> to do that a need to import a sequence with a fasta extension.
>
> http://www.ncbi.nlm.nih.gov/nuccore/9626243?report=fasta&log$=seqview
>
> After download I run the folliwing code with the package seqinr :
>
> dnafile <- system.file("sequences/seqbac.fasta", package = "seqinr")
> cc<-read.fasta(file = dnafile)
>
> cc gives me then the  following  vector
> ...
>
> [47764] "t" "c" "c" "c" "t"
> ......
>
> My problem is I would like now to use that vector to perform basic
> statistics eg; GC content analysis, base frequencies . I hardly see  
> how ?
> For instance an histogram on my vector like hist(cc) don't work

It looks like the call through seqnir::read.fasta returns you a  
character vector for the sequence? (I'm guessing, I haven't used it).

If that's the case, one way to get frequencies would be via the table  
command, eg:

R> fa <- c("t", "c", "c", "c", "t", "a", "g", "a", "a", "g")
R> table(fa)
fa <- c("t", "c", "c", "c", "t", "a", "g", "a", "a", "g")
fa
a c g t
3 3 2 2

Though, I'd probably prefer using Biostrings:

> My first intention by the way was to use biostring package to import  
> fasta
> file, like readFASTA(" directory",strip.desc=TRUE). But how sould I  
> know
> under which directory  I have to  put data   ? Because I ve tried few
> directories but he still do not found my data

How is it that you don't know where to find your data? I'm not sure  
there's anything we can do to help you find it, so ... just find it :-)

Once you know where it is, you can pass the absolute path *of the  
file* to the readFASTA function. In your example above, it looks like  
you want to call "readFASTA" on a directory, which won't work.

For instance, on my computer (I'm using OS X), in order to read in  
some file on my HD, I'd do:

library(Biostrings)
my.fasta <- readFASTA('/Users/stavros/Data/YeastPromoters.fa')

Does that help?

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
   |  Memorial Sloan-Kettering Cancer Center
   |  Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the Bioconductor mailing list