[R] help with merging two dataframes function of "egrep"-like formulas
Bogdan Tanasa
Thu Jul 19 04:50:56 CEST 2018
it looks great, thank you very much Jeff for your time and kind help !
> The traditional (SQL) way to attack this problem is to make the data
> structure simpler so that faster comparisons can be utilized:
> ################
> A <- data.frame(z=c("a*b", "c*d", "d*e", "e*f"), t =c(1, 2, 3, 4))
> B <- data.frame(z=c("a*b::x*y", "c", "", "g*h"), t =c(1, 2, 3, 4))
> library(dplyr)
> library(tidyr)
> Bx <- ( B
> %>% mutate( z_B = as.character( z ) )
> %>% rename( t_B = t )
> %>% separate_rows( z, sep="::" )
> )
> Bx
> #> z t_B z_B
> #> 1 a*b 1 a*b::x*y
> #> 2 x*y 1 a*b::x*y
> #> 3 c 2 c
> #> 4 3
> #> 5 g*h 4 g*h
> result <- ( A
> %>% mutate( z = as.character( z ) )
> %>% rename( t_A = t )
> %>% inner_join( Bx, by="z" )
> )
> result
> #> z t_A t_B z_B
> #> 1 a*b 1 1 a*b::x*y
> Note that this is preferable if you can avoid ever creating the complex
> data z in B, but Bx is much more flexible and less error prone than B.
> (Especially if you don't have to create B$z_B at all, but have some other
> unique identifier(s) for the groupings represented by each row in B.)
>
> Thanks a lot ! It looks that I am getting the same results with :
>> B %>% regex_left_join(A, by = c(z = 'z'))
>>> This may be what you are looking for:
>>>
>>> library(fuzzyjoin)
>>>
>>> The inner join returns just the one row where the string matches.
>>> B %>%
>>> regex_inner_join(A, by = c(z = 'z'))
>>>
>>> While the full join returns NA's where the string does not match.
>>> B %>%
>>> regex_full_join(A, by = c(z = 'z'))
>>>
>>>>
>>
