[R] how to load only lines that start with a particular symbol

William Dunlap wdunlap at tibco.com
Tue Sep 15 23:44:35 CEST 2009


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of J Chen
> Sent: Tuesday, September 15, 2009 2:00 PM
> To: r-help at r-project.org
> Subject: [R] how to load only lines that start with a 
> particular symbol
> 
> 
> Dear all,
> 
> I have DNA sequence data which are fasta-formatted as
> 
> >gene A;.....
> AAAAACCCC
> TTTTTGGGG
> CCCTTTTTT
> >gene B;....
> CCCCCAAAA
> GGGGGTTTT
> 
> I want to load only the lines that start with ">" where the annotation
> information for the gene is contained. In principle, I can remove the
> sequences before loading or after loading all the lines. I 
> just wonder if
> there's a way to load only lines with a particular pattern. The skip
> argument in read.table() doesn't work for my purpose.

You could use pipe() to call an external program like grep
or perl to filter the lines of interest from the file so R's input
routine  only has to allocate space for those.  E.g., the
following makes a sample file and the readLines(pipe(...))
call reads only the lines starting with ">> " from it.   (It
assumes you don't have grep in PATH and gives where it is
installed on my Windows machine.)

  > tfile <- tempfile()
  > cat(file=tfile, sep="\n", c(">> Date", ">> Author", "columnA
columnB", "1 2", "3 4"))

  > readLines(tfile)
  [1] ">> Date"         ">> Author"       "columnA columnB" "1 2"

  [5] "3 4"            
  > readLines(pipe(paste("e:/cygwin/bin/grep \"^>> \" ", tfile)))
  [1] ">> Date"   ">> Author"

perl can do more complicated processing and filtering than grep.

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com  

> 
> Thanks in advance,
> Jimmy
> 
> -- 
> View this message in context: 
> http://www.nabble.com/how-to-load-only-lines-that-start-with-a
> -particular-symbol-tp25461693p25461693.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 




More information about the R-help mailing list