[R] Subsetting dataframe by the nearest values of a vector elements

Harun Rashid mhrashidbau at yahoo.com
Tue Nov 10 09:39:12 CET 2015


HI Jean,
Here is part of my data. As you can see, I have cross-section point and 
corresponding elevation of a river. Now I want to select cross-section 
points by 50m interval. But the real cross-section data might not have 
exact points say 0, 50, 100,…and so on. Therefore, I need to take points 
closest to those values.

    cross_section elevation
    1: 5.608 12.765
    2: 11.694 10.919
    3: 14.784 10.274
    4: 20.437 7.949
    5: 22.406 7.180
    101: 594.255 7.710
    102: 595.957 7.717
    103: 597.144 7.495
    104: 615.925 7.513
    105: 615.890 7.751

I checked for some suggestions [particularly here 
<http://stackoverflow.com/questions/20133344/find-closest-value-in-a-vector-with-binary-search>] 
and finally did like this.

    intervals <- c(5,50,100,150,200,250,300,350,400,450,500,550,600)
    dt = data.table(real.val = w$cross_section, w)
    setattr(dt,’sorted’,’cross_section’)
    dt[J(intervals), roll = “nearest”]

And it gave me what I wanted.

    dt[J(intervals), roll = “nearest”]
    cross_section real.val elevation
    1: 5 5.608 12.765
    2: 50 49.535 6.744
    3: 100 115.614 8.026
    4: 150 152.029 7.206
    5: 200 198.201 6.417
    6: 250 247.855 4.497
    7: 300 298.450 11.299
    8: 350 352.473 11.534
    9: 400 401.287 10.550
    10: 450 447.768 9.371
    11: 500 501.284 8.984
    12: 550 550.650 16.488
    13: 600 597.144 7.495

I don’t know whether there is a smarter to accomplish this!
Thanks in advance.
Regards,
Harun

On 11/10/15 11:17 AM, David Winsemius wrote:

>> On Nov 9, 2015, at 9:19 AM, Adams, Jean <jvadams at usgs.gov> wrote:
>>
>> Harun,
>>
>> Can you give a simple example?
>>
>> If your cross_section looked like this
>> c(144, 179, 214, 39, 284, 109, 74, 4, 249)
>> and your other vector looked like this
>> c(0, 50, 100, 150, 200, 250, 300, 350)
>> what would you want your subset to look like?
>>
>> Jean
>>
>> On Mon, Nov 9, 2015 at 7:26 AM, Harun Rashid via R-help <
>> r-help at r-project.org> wrote:
>>
>>> Hello,
>>> I have a dataset with two columns 1. cross_section (range: 0~635), and
>>> 2. elevation. The dataset has more than 100 rows. Now I want to make a
>>> subset on the condition that the 'cross_section' column will pick up the
>>> nearest cell from another vector (say 0, 50,100,150,200,.....,650).
>>> How can I do this? I would really appreciate a solution.
> If you what the "other vector" to define the “cell” boundaries, and using Jean’s example, it is a simple application of `findInterval`:
>
>> inp <- c(144, 179, 214, 39, 284, 109, 74, 4, 249)
>> mids <- c(0, 50, 100, 150, 200, 250, 300, 350)
>> findInterval( inp, c(mids) )
> [1] 3 4 5 1 6 3 2 1 5
>
> On the other hand ...
>
> To find the number of "closest point", this might help:
>
>
>> findInterval(inp, c( mids[1]-.001, head(mids,-1)+diff(mids)/2, tail(mids,1)+.001 ) )
> [1] 4 5 5 2 7 3 2 1 6
>
>
>
>> David Winsemius
> Alameda, CA, USA
>
​

	[[alternative HTML version deleted]]



More information about the R-help mailing list