[R] merging and obtaining the nearest value

Rui Barradas ruipbarradas at sapo.pt
Sun Aug 19 13:49:53 CEST 2012


Hello,

Yes you can, if you have memory problems, say so and we'll see it then.
In the mean time, there's something you should change, to allow for 
several minima but to only return one per combination of  TYPE and DATE.

Replace this

x[which(min(a) == a), ]

by this

x[which.min(a), ]

Rui Barradas

Em 19-08-2012 12:00, Francesco escreveu:
> Dear Riu, Many thanks for your suggestion
>
> However these are just simplified examples... in reality the dataset A
> contains millions of observations and B several thousands of rows...
> Could I still use a modified form of your suggestion?
>
> Thanks
>
> On 19 August 2012 12:51, Rui Barradas <ruipbarradas at sapo.pt> wrote:
>> Hello,
>>
>> Try the following.
>>
>>
>> A <- read.table(text="
>>
>> TYPE   DATE
>> A            2
>> A            5
>> A            20
>> B            10
>> B            2
>> ", header = TRUE)
>>
>>
>> B <- read.table(text="
>>
>> TYPE  Special_Date
>> A              2
>> A              6
>> A              20
>> A              22
>> B              5
>> B              6
>> ", header = TRUE)
>>
>> result <- do.call( rbind, lapply(split(merge(A, B), list(m$DATE, m$TYPE)),
>> function(x){
>>          a <- abs(x$DATE - x$Special_Date)
>>          if(nrow(x)) x[which(min(a) == a), ] }) )
>> result$Difference <- result$DATE - result$Special_Date
>> result$Special_Date <- NULL
>> rownames(result) <- seq_len(nrow(result))
>> result
>>
>>
>> Also, it's a good practice to post data examples using dput(). For instance,
>>
>> dput(A)
>> structure(list(TYPE = structure(c(1L, 1L, 1L, 2L, 2L), .Label = c("A",
>> "B"), class = "factor"), DATE = c(2L, 5L, 20L, 10L, 2L)), .Names = c("TYPE",
>> "DATE"), class = "data.frame", row.names = c(NA, -5L))
>>
>> Now all we have to do is run the statement A <- structure(... etc...) to
>> have an exact copy of the data example.
>> Anyway, your example with input and the wanted result was very welcome.
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> Em 19-08-2012 11:10, Francesco escreveu:
>>> Dear R-help
>>>
>>> Î would like to know if there is a short solution in R for this
>>> merging problem...
>>>
>>> Let say I have a dataset A as:
>>>
>>> TYPE   DATE
>>> A            2
>>> A            5
>>> A            20
>>> B            10
>>> B            2
>>>
>>> (there can be duplicates for the same type and date)
>>>
>>> and I have another dataset B as :
>>>
>>> TYPE  Special_Date
>>> A              2
>>> A              6
>>> A              20
>>> A              22
>>> B              5
>>> B              6
>>>
>>> The question is : I would like to obtain the difference between the
>>> date of each observation in A and the closest special date in B with
>>> the same type. In case of ties I would take the latest date of the
>>> two.
>>>
>>> For example I would obtain here
>>>
>>> TYPE   DATE   Difference
>>> A            2            0=2-2
>>> A            5            -1=5-6
>>> A            20            0=20-20
>>> B            10           +4=10-6
>>> B            2             -3=2-5
>>>
>>> Do you know how to (simply?) obtain this in R?
>>>
>>> Many thanks!
>>> Best Regards
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list