[BioC] read sequences from fasta file starting with > sign and untill next > sign

Jack [guest] guest at bioconductor.org
Fri Sep 14 15:11:08 CEST 2012


Hi:

I am trying to read sequences from a fasta file starting with > till the next > sign:

library(ShortRead)
setwd("fastafolder");
con <- file("somefastafile.fa"); 
open(con)
pattern <- as.character("TACC")
while(length(res <- readLines(con, n=1)))
{
#do something
}
close(con)

With this while statement I am able to read a single line from the fasta file each time. But I want to read a chunk of links each time from the fasta file starting with  > sign and till the next  >  sign. 

Example
>AAATTT
TAGGCT
ATTTGC
>CGATTT

 And I want to read the following in the first run of while loop 
>AAATTT
TAGGCT
ATTTGC

Thanks for your help.

Regards:
Jack


 -- output of sessionInfo(): 

> sessionInfo() 
R version 2.15.1 (2012-06-22)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] CHNOSZ_0.9-7         ShortRead_1.14.4     latticeExtra_0.6-24  RColorBrewer_1.0-5   Rsamtools_1.8.6      lattice_0.20-10      Biostrings_2.24.1   
 [8] GenomicRanges_1.8.13 IRanges_1.14.4       BiocGenerics_0.2.0  

loaded via a namespace (and not attached):
[1] Biobase_2.16.0 bitops_1.0-4.1 grid_2.15.1    hwriter_1.3    stats4_2.15.1  tools_2.15.1   zlibbioc_1.2.0

--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list