[R] Partial LookUP

PIKAL Petr petr@pik@l @ending from prechez@@cz
Thu Nov 22 08:27:15 CET 2018


Hi

I did not see any answer so I try to generate some answer.
It seems to me that your second attempt was quite close.

If passengerid was numeric, following code could probably give you the required result.

res <- rep(NA, nrow(df1))
for (i in 1:NROW(df1)) {
sel <- which(str_detect(df1$Name,coll(df1$HusbandName[i])))
if (length(sel) > 0) { res[i] <- df1$passengerid[sel]}
}

res should contain passengerid for each relevant line and NA if there is no match. You just could add it to your data frame as a new column.

The problem is that although you provide "a kind of" example, HTML format probably scrambled it somehow. Better is to use dput for sending test data and not  use HTML formating.

This is data frame I got from your mail.

> dput(df1)
structure(list(passengerid = structure(c(3L, 4L, 2L, 1L), .Label = c("3302",
"7767", "908", "9883"), class = "factor"), Name = c("Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)",
"Backstrom, Mr. Karl Alfred John", "Cumings, Mrs. John Bradley (Florence Briggs Thayer)",
"Cumings, Mr. John Bradley"), HusbandName = c("Backstrom, Mr. Karl Alfred",
"", "Cumings, Mr. John\nBradley", "")), row.names = c(NA, -4L
), class = "data.frame")

Cheers
Petr

> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of gary chimuzinga
> Sent: Tuesday, November 20, 2018 5:06 PM
> To: r-help using r-project.org
> Subject: [R] Partial LookUP
>
> I am working n R, using R studio,
> I have a dataframe with 4 columns. Column A contains passenger iD, B contains
> passenger name, C contains husband name.
> I am attempting to create a new column which look to see if the husband name
> in column C is listed in any of the records in column B. If so it should then
> return to me the passenger iD of the husband from column A.
> To make things more complicated, as in the first example in some cases, the
> husband's given in column C might not include the his second name, which
> would be included in column B.
>
> Reproducible Example
> library(stringr)
> rm(list=ls())
> passengerid <- c(0908,9883,7767,3302)
>
> Name<- c("Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson)",
>           "Backstrom, Mr. Karl Alfred John",
>           "Cumings, Mrs. John Bradley (Florence Briggs Thayer)",
>           "Cumings, Mr. John Bradley")
>
> HusbandName <- c("Backstrom, Mr. Karl Alfred","","Cumings, Mr. John
> Bradley","")
>
>
>
> df1<- data.frame(cbind(passengerid,Name,HusbandName))
> df1$Name <- as.character(df1$Name)
> df1$HusbandName <- as.character(df1$HusbandName)
>
> I have tried using Stringr, but facing problems because 1)I need the code to look
> at only 1 element of the vector HusbandName and search for it in the whole
> vector Name. 2) I found it difficult to use regular expressions given that the
> pattern I am looking for is vectorised (as HusbandName)
> This is what I have tried so far:
>
> Attempt 1 - only finds exact matches & doesn't return the passengerID &
> doesn't add column to df
> df1$Husbandid < - for (i in 1:NROW(df1$HusbandName)) {
> print(HusbandName[i] %in% Name)}
>
>
> Attempt 2 - finds partial matches, but does not ignore blanks & does not tell
> me passenger id & doesn't add column to df
> df1$Husbandid <- for (i in 1:NROW(df1$HusbandName)) {
> print(which(str_detect(df1$Name,df1$HusbandName[i])))}
>
>
> #Attempt 3 - almost works but - the printed results are different from those
> added into the dataframe as a new column. how can i correct for this?
> Ultimately I need the ones in the df to be correct. the error is that those
> without husbands are showing husbandiD when this should be blank or na. can
> this be corrected or is there a way to convert the output of the for loop into a
> vector we can add to the df?
> for (i in 1:NROW(df1$HusbandName)) {
>      if (df1$HusbandName[i] =="") {
>       print("Man") & next()
>       }
>     FoundHusbandNames<-
> c(which(str_detect(df1$Name,df1$HusbandName[i])))
>     print(df1$passengerid[FoundHusbandNames]) -> df1$Husbandid[i] }
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about processing and protection of business partner’s personal data are available on website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any documents attached to it may be confidential and are subject to the legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/



More information about the R-help mailing list