[BioC] problem importing a fasta file biostrings or seqinr ?
Hervé Pagès
hpages at fhcrc.org
Mon Nov 9 19:42:59 CET 2009
Hi Moses,
Once you've figured out where your FASTA file is located, you can
do:
library(Biostrings)
myseqs <- read.DNAStringSet("path/to/your/fasta_file.fa", "fasta")
myseqs
myseq <- myseqs[[1]]
## For base frequencies:
alphabetFrequency(myseq)
## For GC content:
dinucleotideFrequency(myseq)
dinucleotideFrequency(myseq, as.prob=TRUE)
Biostrings also has trinucleotideFrequency(),
oligonucleotideFrequency(), and much more (see man pages
for more info about those functions).
Cheers,
H.
m a wrote:
> Hello,
>
> I would like to make simple statistics on a specific DNA sequence. In order
> to do that a need to import a sequence with a fasta extension.
>
> http://www.ncbi.nlm.nih.gov/nuccore/9626243?report=fasta&log$=seqview
>
> After download I run the folliwing code with the package seqinr :
>
> dnafile <- system.file("sequences/seqbac.fasta", package = "seqinr")
> cc<-read.fasta(file = dnafile)
>
> cc gives me then the following vector
> ...
>
> [47764] "t" "c" "c" "c" "t"
> ......
>
> My problem is I would like now to use that vector to perform basic
> statistics eg; GC content analysis, base frequencies . I hardly see how ?
> For instance an histogram on my vector like hist(cc) don't work
>
>
> My first intention by the way was to use biostring package to import fasta
> file, like readFASTA(" directory",strip.desc=TRUE). But how sould I know
> under which directory I have to put data ? Because I ve tried few
> directories but he still do not found my data
>
>
>
> Thanks in advance,
>
>
> Moses
> Student in biostatistics
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
More information about the Bioconductor
mailing list