[BioC] problem importing a fasta file biostrings or seqinr ?

Hervé Pagès hpages at fhcrc.org
Mon Nov 9 19:42:59 CET 2009


Hi Moses,

Once you've figured out where your FASTA file is located, you can
do:

   library(Biostrings)
   myseqs <- read.DNAStringSet("path/to/your/fasta_file.fa", "fasta")
   myseqs
   myseq <- myseqs[[1]]

   ## For base frequencies:
   alphabetFrequency(myseq)

   ## For GC content:
   dinucleotideFrequency(myseq)
   dinucleotideFrequency(myseq, as.prob=TRUE)

Biostrings also has trinucleotideFrequency(),
oligonucleotideFrequency(), and much more (see man pages
for more info about those functions).

Cheers,
H.


m a wrote:
> Hello,
> 
> I would like to make simple statistics on a specific DNA sequence. In order
> to do that a need to import a sequence with a fasta extension.
> 
> http://www.ncbi.nlm.nih.gov/nuccore/9626243?report=fasta&log$=seqview
> 
> After download I run the folliwing code with the package seqinr :
> 
> dnafile <- system.file("sequences/seqbac.fasta", package = "seqinr")
> cc<-read.fasta(file = dnafile)
> 
> cc gives me then the  following  vector
> ...
> 
> [47764] "t" "c" "c" "c" "t"
> ......
> 
> My problem is I would like now to use that vector to perform basic
> statistics eg; GC content analysis, base frequencies . I hardly see how ?
> For instance an histogram on my vector like hist(cc) don't work
> 
> 
> My first intention by the way was to use biostring package to import fasta
> file, like readFASTA(" directory",strip.desc=TRUE). But how sould I know
> under which directory  I have to  put data   ? Because I ve tried few
> directories but he still do not found my data
> 
> 
> 
> Thanks in advance,
> 
> 
> Moses
> Student in biostatistics
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list