[R] extracting data using strings as delimiters

lucy b lucy.lists at gmail.com
Wed Sep 26 15:50:00 CEST 2007


All great ideas. I tried strsplit first and it worked, but thanks everyone!

Best-
LB

On 9/25/07, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> Perhaps you could clarify what the general rule is but assuming
> that what you want is any word after a colon it can be done with
> strapply in the gsubfn package like this:
>
> Lines <- c("Year Built:  1873 Gross Building Area:  578 sq ft",
> "Total Rooms:  6 Living Area:  578 sq ft")
>
> library(gsubfn)
> strapply(Lines, ": *(\\w+)", backref = -1)
>
> # or if each line has same number of returned words
> strapply(Lines, ": *(\\w+)", backref = -1, simplify = rbind)
>
> This matches a colon (:) followed by zero or more spaces ( *)
> followed by a word ((\\w+)) and backref= - 1 causes it to return
> only the first backreference (i..e. the portion within parentheses)
> but not the match itself.
>
> On 9/25/07, lucy b <lucy.lists at gmail.com> wrote:
> > Dear List,
> >
> > I have an ascii text file with data I'd like to extract. Example:
> >
> > Year Built:  1873 Gross Building Area:  578 sq ft
> > Total Rooms:  6 Living Area:  578 sq ft
> >
> > There is a lot of data I'd like to ignore in each record, so I'm
> > hoping there is a way to use strings as delimiters to get the data I
> > want (e.g. tell R to take data between "Built:" and "Gross" -
> > incidentally, not always numeric). I think an ugly way would be to
> > start at the end of each record and use a substitution expression to
> > chip away at it, but I'm afraid it will take forever to run. Is there
> > a way to use strings as delimiters in an expression?
> >
> > Thanks in advance for ideas.
> >
> > LB
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>



More information about the R-help mailing list