[R] Search within a file

Seth Falcon sfalcon at fhcrc.org
Fri Nov 4 07:43:40 CET 2005


On  3 Nov 2005, JAROSLAW.W.TUSZYNSKI at saic.com wrote:
> I am looking for a way to search a file for position of some
> expression, from within R. My current code:
>
> sha1Pos = gregexpr("<sha1>", readChar(filename,
> file.info(filename)$size))[[1]]
>
> Works fine for small files, but text files I will be working with
> might get up to Gb range, so I was trying to accomplish the same
> without loading the whole file into R.

I would think you could use readLines to read in a batch of lines, run
(g)regexpr, and keep track of matches and position.

Create a connection to the file using file() first, and then
subsequent calls to readLines will start where you left off.

But you will need to adjust the position indices returned by gregexpr
by how far into the file you are.  Seems very doable.

+ seth




More information about the R-help mailing list