[R] extracting data using strings as delimiters

Gabor Grothendieck ggrothendieck at gmail.com
Wed Sep 26 00:59:57 CEST 2007


Perhaps you could clarify what the general rule is but assuming
that what you want is any word after a colon it can be done with
strapply in the gsubfn package like this:

Lines <- c("Year Built:  1873 Gross Building Area:  578 sq ft",
"Total Rooms:  6 Living Area:  578 sq ft")

library(gsubfn)
strapply(Lines, ": *(\\w+)", backref = -1)

# or if each line has same number of returned words
strapply(Lines, ": *(\\w+)", backref = -1, simplify = rbind)

This matches a colon (:) followed by zero or more spaces ( *)
followed by a word ((\\w+)) and backref= - 1 causes it to return
only the first backreference (i..e. the portion within parentheses)
but not the match itself.

On 9/25/07, lucy b <lucy.lists at gmail.com> wrote:
> Dear List,
>
> I have an ascii text file with data I'd like to extract. Example:
>
> Year Built:  1873 Gross Building Area:  578 sq ft
> Total Rooms:  6 Living Area:  578 sq ft
>
> There is a lot of data I'd like to ignore in each record, so I'm
> hoping there is a way to use strings as delimiters to get the data I
> want (e.g. tell R to take data between "Built:" and "Gross" -
> incidentally, not always numeric). I think an ugly way would be to
> start at the end of each record and use a substitution expression to
> chip away at it, but I'm afraid it will take forever to run. Is there
> a way to use strings as delimiters in an expression?
>
> Thanks in advance for ideas.
>
> LB
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list