[R] Selecting a minimum value of an attribute associated with point values neighboring a given point and assigning it as a new attribute

Eric Berger er|cjberger @end|ng |rom gm@||@com
Sun Nov 6 16:27:46 CET 2022


Whoops ... left out a line in Part 2. Resending with the correction

## PART 2: You can use this code on the real data with f() defined appropriately
A <- matrix(0,N,N)
v <- 1:N
## get the indices (j,k) where j < k (as columns in a data.frame)
idx <- expand.grid(v,v) |> rename(j=Var1,k=Var2) |> filter(j < k)
u <- sapply(1:nrow(idx),
           \(i){ j <- idx$j[i]; k <- idx$k[i]; A[j,k] <<- f(j,k,myData) })
B <- A + t(A) + diag(N)
C <- diag(myData$Conc)
D <- B %*% C
D[D==0] <- NA
myData$Conc_min <- apply(D,MAR=1,\(v){min(v,na.rm=TRUE)})
print(head(myData))

On Sun, Nov 6, 2022 at 5:19 PM Eric Berger <ericjberger using gmail.com> wrote:
>
> Hi Tiffany,
> Here is some code that might help with your problem. I solve a "toy"
> problem that is conceptually the same.
> Part 1 sets up my toy problem. You would have to replace Part 1 with
> your real case. The main point is to define
> a function f(i, j, data) which returns 0 or 1 depending on whether the
> observations in rows i and j in your dataset 'data'
> are within your cutoff distance (i.e. 50m).
>
> You can then use Part 2 almost without changes (except for changing
> 'myData' to the correct name of your data).
>
> I hope this helps,
> Eric
>
> library(dplyr)
>
> ## PART 1: create fake data for minimal example
> set.seed(123) ## for reproducibility
> N <- 5       ## replace by number of locations (approx 9000 in your case)
> MAX_DISTANCE <- 2  ## 50 in your case
> myData <- data.frame(x=rnorm(N),y=rnorm(N),Conc=sample(1:N,N))
>
> ## The function which you must re-define for your actual case.
> f <- function(i,j,a) {
>  dist <- sqrt(sum((a[i,1:2] - a[j,1:2])^2)) ## Euclidean distance
>  as.integer(dist < MAX_DISTANCE)
> }
>
> ## PART 2: You can use this code on the real data with f() defined appropriately
> A <- matrix(0,N,N)
> ## get the indices (j,k) where j < k (as columns in a data.frame)
> idx <- expand.grid(v,v) |> rename(j=Var1,k=Var2) |> filter(j < k)
> u <- sapply(1:nrow(idx),\(i){ j <- idx$j[i]; k <- idx$k[i]; A[j,k] <<-
> f(j,k,myData) })
> B <- A + t(A) + diag(N)
> C <- diag(myData$Conc)
> D <- B %*% C
> D[D==0] <- NA
> myData$Conc_min <- apply(D,MAR=1,\(v){min(v,na.rm=TRUE)})
> print(head(myData))
>
>
> On Sat, Nov 5, 2022 at 5:14 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:
> >
> > Probably better posted on R-sig-geo.
> >
> > -- Bert
> >
> > On Sat, Nov 5, 2022 at 12:36 AM Duhl, Tiffany R. <Tiffany.Duhl using tufts.edu>
> > wrote:
> >
> > > Hello,
> > >
> > > I have sets of spatial points with LAT, LON coords (unprojected, WGS84
> > > datum) and several value attributes associated with each point, from
> > > numerous csv files (with an average of 6,000-9,000 points in each file) as
> > > shown in the following example:
> > >
> > > data<- read.csv("R_find_pts_testdata.csv")
> > >
> > > > data
> > >     ID      Date         Time        LAT            LON           Conc
> > > Leg.Speed    CO2  H2O BC61 Hr Min Sec
> > > 1   76 4/19/2021 21:25:38 42.40066 -70.98802 99300   0.0 mph 428.39 9.57
> > > 578 21  25  38
> > > 2   77 4/19/2021 21:25:39 42.40066 -70.98802 96730   0.0 mph 428.04 9.57
> > > 617 21  25  39
> > > 3   79 4/19/2021 21:25:41 42.40066 -70.98802 98800   0.2 mph 427.10 9.57
> > > 1027 21  25  41
> > > 4   80 4/19/2021 21:25:42 42.40066 -70.98802 96510     2 mph 427.99 9.58
> > > 1381 21  25  42
> > > 5   81 4/19/2021 21:25:43 42.40067 -70.98801 95540     3 mph 427.99 9.58
> > > 1271 21  25  43
> > > 6   82 4/19/2021 21:25:44 42.40068 -70.98799 94720     4 mph 427.20 9.57
> > > 910 21  25  44
> > > 7   83 4/19/2021 21:25:45 42.40069 -70.98797 94040     5 mph 427.18 9.57
> > > 652 21  25  45
> > > 8   84 4/19/2021 21:25:46 42.40072 -70.98795 95710     7 mph 427.07 9.57
> > > 943 21  25  46
> > > 9   85 4/19/2021 21:25:47 42.40074 -70.98792 96200     8 mph 427.44 9.56
> > > 650 21  25  47
> > > 10  86 4/19/2021 21:25:48 42.40078 -70.98789 93750    10 mph 428.76 9.57
> > > 761 21  25  48
> > > 11  87 4/19/2021 21:25:49 42.40081 -70.98785 93360    11 mph 429.25 9.56
> > > 1158 21  25  49
> > > 12  88 4/19/2021 21:25:50 42.40084 -70.98781 94340    12 mph 429.56 9.57
> > > 107 21  25  50
> > > 13  89 4/19/2021 21:25:51 42.40087 -70.98775 92780    12 mph 428.62 9.56
> > > 720 21  25  51
> > >
> > >
> > > What I want to do is, for each point, identify all points within 50m of
> > > that point, find the minimum value of the "Conc" attribute of each nearby
> > > set of points (including the original point) and then create a new variable
> > > ("Conc_min") and assign this minimum value to a new variable added to
> > > "data".
> > >
> > > So far, I have the following code:
> > >
> > > library(spdep)
> > > library(sf)
> > >
> > > setwd("C:\\mydirectory\\")
> > > data<- read.csv("R_find_pts_testdata.csv")
> > >
> > > #make sure the data is a data frame
> > > pts <- data.frame(data)
> > >
> > > #create spatial data frame and define projection
> > > pts_coords <- cbind(pts$LON, pts$LAT)
> > > data_pts <- SpatialPointsDataFrame(coords= pts_coords,
> > > data=pts, proj4string = CRS("+proj=longlat +datum=WGS84"))
> > >
> > > #Re-project to WGS 84 / UTM zone 18N, so the analysis is in units of m
> > > ptsUTM  <- sf::st_as_sf(data_pts, coords = c("LAT", "LON"), remove = F)%>%
> > > st_transform(32618)
> > >
> > > #create 50 m buffer around each point then intersect with points and
> > > finally find neighbors within the buffers
> > > pts_buf <- sf::st_buffer(ptsUTM, 50)
> > > coords  <- sf::st_coordinates(ptsUTM)
> > > int <- sf::st_intersects(pts_buf, ptsUTM)
> > > x   <- spdep::dnearneigh(coords, 0, 50)
> > >
> > > Now at this point, I'm not sure what to either the "int" (a sgbp list) or
> > > "x" (nb object) objects (or even if I need them both)
> > >
> > > > int
> > > Sparse geometry binary predicate list of length 974, where the predicate
> > > was `intersects'
> > > first 10 elements:
> > >  1: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > >  2: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > >  3: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > >  4: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > >  5: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > >  6: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > >  7: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > >  8: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > >  9: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...
> > >
> > > > x
> > > Neighbour list object:
> > > Number of regions: 974
> > > Number of nonzero links: 34802
> > > Percentage nonzero weights: 3.668481
> > > Average number of links: 35.73101
> > >
> > > One thought is that maybe I don't need the dnearneigh function and can
> > > instead convert "int" into a dataframe and somehow merge or associate
> > > (perhaps with an inner join) the ID fields of the buffered and intersecting
> > > points and then compute the minimum value of "Conc" grouping by ID:
> > >
> > > > as.data.frame(int)
> > >     row.id col.id
> > > 1        1      1
> > > 2        1      2
> > > 3        1      3
> > > 4        1      4
> > > 5        1      5
> > > 6        1      6
> > > 7        1      7
> > > 8        1      8
> > > 9        1      9
> > > 10       1     10
> > > 11       1     11
> > > 12       1     12
> > > 13       1     13
> > > 14       1     14
> > > 15       1     15
> > > 16       1     16
> > > 17       1     17
> > > 18       1     18
> > > 19       2      1
> > > 20       2      2
> > > 21       2      3
> > > 22       2      4
> > > 23       2      5
> > > 24       2      6
> > > 25       2      7
> > > 26       2      8
> > > 27       2      9
> > > 28       2     10
> > >
> > >
> > > So in the above example I'd like to take the minimum of "Conc" among the
> > > col.id points grouped with row.id 1 (i.e., col.ids 1-18) and assign the
> > > minimum value of this group as a new variable in data (Data$Conc_min), and
> > > do the same for row.id 2 and all the rest of the rows.
> > >
> > > I'm just not sure how to do this and I appreciate any help folks might
> > > have on this matter!
> > >
> > > Many thanks,
> > > -Tiffany
> > >
> > >         [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list