[R] merge dataframes with conditions formulated as logical expressions

Wolfram Fischer wolfram at fischer-zim.ch
Thu Jun 15 07:06:09 CEST 2006


--- Reply to: ---
>Date:    14.06.06 16:17 (+0000)
>From:    Adaikalavan Ramasamy <ramasamy at cancer.org.uk>
>Subject: Re: [R] merge dataframes with conditions formulated as logical expressions
>
> You have discontinuity between your MIN.VAL and MAX.VAL for a given
> group. If this is true in practise, then you may want to check and
> report when VAL is in the discontinuous region.

Your solution without concerning discontinuity is better
because it is more general.

> Here is my solution that ignores that (and only uses MIN.VAL and
> completely disrespecting MAX.VAL). Not very elegant but should do
> the trick.
> 
> 
>  df <- data.frame( GRP=c( "A", "A", "B" ), VAL=c( 10, 100, 200 ) )
>  dp <- data.frame( GRP=c( "A", "A", "B", "B" ), MIN.VAL=c( 1, 50, 1,
> 70 ), MAX.VAL=c( 49, 999, 59, 999 ),  VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )
> 
>  x <- split(df, df$GRP)
>  y <- split(dp, dp$GRP)
> 
>  out <- NULL
>  for(g in names(x)){
> 
>    xx <- x[[g]]
>    yy <- y[[g]]
> 
>    w   <- cut(xx$VAL, breaks=c(yy$MIN.VAL, Inf), labels=F)
>    tmp <- cbind(xx, yy[w, "VAL2"])
>    colnames(tmp) <- c("GRP", "VAL", "VAL2")
>    out <- rbind(out, tmp)
>  } 
>  out
> 
> Regards, Adai

Thanks for this solution.

I did not yet try to program a conventional solution
because I thought there would be a nice shortcut in R
to solve the problem comparably elegantly as in SQL:
	select df.*, dp.VAL2
	from df, dp
	where df.GRP = dp.GRP
	  and df.VAL > dp.MIN_VAL
	  and df.VAL <= dp.MAX_VAL

Wolfram

> On Wed, 2006-06-14 at 16:55 +0200, Wolfram Fischer wrote:
> > I have a data.frame df containing two variables:
> >     GRP: Factor
> >     VAL: num
> > 
> > I have a data.frame dp containing:
> >     GRP: Factor
> >     MIN.VAL: num
> >     MAX.VAL: num
> >     VAL2: num
> > with several rows per "GRP"
> > where dp[i-1, "MAX.VAL"] < dp[i, "MIN.VAL"]
> > within the same "GRP".
> > 
> > I want to create df[i, "VAL2"] <- dpp[z, "VAL2"] 
> > with    i along df 
> > and     dpp <- subset( dp, GRP = df[i, "GRP"] )
> > so that it is true for each i:
> >         df[i, "VAL"] > dpp[z, "MIN.VAL"]
> >    and  df[i, "VAL"] <= dpp[z, "MAX.VAL"]
> > 
> > Is there an easy/efficient way to do that?
> > 
> > Example:
> > df <- data.frame( GRP=c( "A", "A", "B" ), VAL=c( 10, 100, 200 ) )
> > dp <- data.frame( GRP=c( "A", "A", "B", "B" ),
> >     MIN.VAL=c( 1, 50, 1, 70 ), MAX.VAL=c( 49, 999, 59, 999 ), 
> >     VAL2=c( 1.1, 2.2, 3.3, 4.4 ) )
> > 
> > The result should be:
> >     df$VAL2 <- c( 1.1, 2.2, 4.4 )
> > 
> > Thanks - Wolfram
> > 
> > ______________________________________________



More information about the R-help mailing list