[R] extracting data using strings as delimiters

jim holtman jholtman at gmail.com
Tue Sep 25 23:25:59 CEST 2007


Here is one way.  You can setup a list of the patterns to match
against and then apply it to the string.  I am not sure  what the rest
of the text file look like, but this will return all the values that
match.

> x <- readLines(textConnection("Year Built:  1873 Gross Building Area:  578 sq ft
+ Total Rooms:  6 Living Area:  578 sq ft
+ Year Built:  1873 Gross Building Area:  578 sq ft
+ Total Rooms:  6 Living Area:  578 sq ft"))
>
> # list for pattern matches
> m.list <- list(year=".*Year Built:(.*)Gross.*",
+     Buildarea=".*Building Area:(.*)sq ft.*",
+     rooms=".*Rooms:(.*)Liv.*",
+     Livingarea=".*Living Area:(.*)sq ft.*")
>
> # use lapply to process the patterns and return a list with the name of the
> # pattern and its value
> lapply(names(m.list), function(.pat){
+     # see which lines have the desired patterns
+     whichLines <- grep(m.list[[.pat]], x)
+     if (length(whichLines) > 0){
+         return(list(pattern=.pat, values=sub(m.list[[.pat]], "\\1",
x[whichLines])))
+     }
+     else return(NULL)
+ })
[[1]]
[[1]]$pattern
[1] "year"

[[1]]$values
[1] "  1873 " "  1873 "


[[2]]
[[2]]$pattern
[1] "Buildarea"

[[2]]$values
[1] "  578 " "  578 "


[[3]]
[[3]]$pattern
[1] "rooms"

[[3]]$values
[1] "  6 " "  6 "


[[4]]
[[4]]$pattern
[1] "Livingarea"

[[4]]$values
[1] "  578 " "  578 "




On 9/25/07, lucy b <lucy.lists at gmail.com> wrote:
> Dear List,
>
> I have an ascii text file with data I'd like to extract. Example:
>
> Year Built:  1873 Gross Building Area:  578 sq ft
> Total Rooms:  6 Living Area:  578 sq ft
>
> There is a lot of data I'd like to ignore in each record, so I'm
> hoping there is a way to use strings as delimiters to get the data I
> want (e.g. tell R to take data between "Built:" and "Gross" -
> incidentally, not always numeric). I think an ugly way would be to
> start at the end of each record and use a substitution expression to
> chip away at it, but I'm afraid it will take forever to run. Is there
> a way to use strings as delimiters in an expression?
>
> Thanks in advance for ideas.
>
> LB
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list