[R] grep or other complex string matching approach to capture necessary information...

Tony Plate tplate at acm.org
Fri Sep 25 20:13:22 CEST 2009


You could use grep, but it's probably easier to use %in% (see also is.element()), e.g.:

> house_info[ house_info[,1] %in% c("Water damage", "water pipes damaged", "leaking water"), ]
   water_evaluation.water_evaluation_selection. house_number
6                           water pipes damaged          489
8                           water pipes damaged          512
11                          water pipes damaged          597
19                                 Water damage          478
21                          water pipes damaged          373
23                                 Water damage          465
....
> house_info[ house_info[,1] %in% c("Water damage", "water pipes damaged", "leaking water"), 2]
 [1] 489 512 597 478 373 465 337 362 234 535 551 351 415 495 220 216 317 443 346 577 585 268 463 441 225 200 304 486 390 476 485 247
[33] 399 504 262 551 575 359 538
> sort(unique(house_info[ house_info[,1] %in% c("Water damage", "water pipes damaged", "leaking water"), 2]))
 [1] 200 216 220 225 234 247 262 268 304 317 337 346 351 359 362 373 390 399 415 441 443 463 465 476 478 485 486 489 495 504 512 535
[33] 538 551 575 577 585 597
> 


Also, an easier way to generated random integers is sample(), e.g.
> sample(1:3, size=5, rep=T)
[1] 3 1 2 1 1
> 
(This is more straightforward, and more easily avoids possibly unintended errors such as floor(runif(100, 1,6) never generating a 6, but do be careful of the gotcha that sample(2:3, ...) will generate a selection of 2's and 3's, while sample(3,...) will generate samples from 1, 2, and 3.)

-- Tony Plate

Jason Rupert wrote:
> Say I have the following data:
> 
> 
> house_number<-floor(runif(100, 200, 600))
> water_evaluation<-c("No water damage", "Water damage", "Water On", "Water off", "water pipes damaged", "leaking water")
> water_evaluation_selection<-floor(runif(100, 1,6))
> house_info<-data.frame(water_evaluation[water_evaluation_selection],
>                        house_number) 
> 
> And, that I only want to pull out the ones with negative water evaluations, i.e. Water damage, water pipes damaged, and leaking water. 
> 
> Should/could I use grep in order to pull the house numbers out of house_info with those negative water evaluations?  
> 
> I guess I want to know the house numbers from house_info where the water evaluation is negative.  Is there a way to use grep or another R function in order to acquire that information? 
> 
> Thank you again in advance for any insights.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list