[R] merging and obtaining the nearest value

William Dunlap wdunlap at tibco.com
Sun Aug 19 18:43:06 CEST 2012


How many different types are there?  Just a handful or many thousands?
For this sort of problem it is often handy to write a function which generates
datasets of the sort you are thinking of but parameterized by the
number of rows, levels, etc., so you can see how the execution time
varies with these things.

If there are just a few types, try looping over types and using findInterval
to see where A$Date fits into the sequence of B$Special_Date.


Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Francesco
> Sent: Sunday, August 19, 2012 4:01 AM
> To: r-help at r-project.org
> Subject: Re: [R] merging and obtaining the nearest value
> 
> Dear Riu, Many thanks for your suggestion
> 
> However these are just simplified examples... in reality the dataset A
> contains millions of observations and B several thousands of rows...
> Could I still use a modified form of your suggestion?
> 
> Thanks
> 
> On 19 August 2012 12:51, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> > Hello,
> >
> > Try the following.
> >
> >
> > A <- read.table(text="
> >
> > TYPE   DATE
> > A            2
> > A            5
> > A            20
> > B            10
> > B            2
> > ", header = TRUE)
> >
> >
> > B <- read.table(text="
> >
> > TYPE  Special_Date
> > A              2
> > A              6
> > A              20
> > A              22
> > B              5
> > B              6
> > ", header = TRUE)
> >
> > result <- do.call( rbind, lapply(split(merge(A, B), list(m$DATE, m$TYPE)),
> > function(x){
> >         a <- abs(x$DATE - x$Special_Date)
> >         if(nrow(x)) x[which(min(a) == a), ] }) )
> > result$Difference <- result$DATE - result$Special_Date
> > result$Special_Date <- NULL
> > rownames(result) <- seq_len(nrow(result))
> > result
> >
> >
> > Also, it's a good practice to post data examples using dput(). For instance,
> >
> > dput(A)
> > structure(list(TYPE = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("A",
> > "B"), class = "factor"), DATE = c(2L, 5L, 20L, 10L, 2L)), .Names = c("TYPE",
> > "DATE"), class = "data.frame", row.names = c(NA, -5L))
> >
> > Now all we have to do is run the statement A <- structure(... etc...) to
> > have an exact copy of the data example.
> > Anyway, your example with input and the wanted result was very welcome.
> >
> > Hope this helps,
> >
> > Rui Barradas
> >
> > Em 19-08-2012 11:10, Francesco escreveu:
> >>
> >> Dear R-help
> >>
> >> Î would like to know if there is a short solution in R for this
> >> merging problem...
> >>
> >> Let say I have a dataset A as:
> >>
> >> TYPE   DATE
> >> A            2
> >> A            5
> >> A            20
> >> B            10
> >> B            2
> >>
> >> (there can be duplicates for the same type and date)
> >>
> >> and I have another dataset B as :
> >>
> >> TYPE  Special_Date
> >> A              2
> >> A              6
> >> A              20
> >> A              22
> >> B              5
> >> B              6
> >>
> >> The question is : I would like to obtain the difference between the
> >> date of each observation in A and the closest special date in B with
> >> the same type. In case of ties I would take the latest date of the
> >> two.
> >>
> >> For example I would obtain here
> >>
> >> TYPE   DATE   Difference
> >> A            2            0=2-2
> >> A            5            -1=5-6
> >> A            20            0=20-20
> >> B            10           +4=10-6
> >> B            2             -3=2-5
> >>
> >> Do you know how to (simply?) obtain this in R?
> >>
> >> Many thanks!
> >> Best Regards
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list