[R] Moran I for very large data set

Roger Bivand Roger.Bivand at nhh.no
Tue Nov 30 13:51:10 CET 2010


First, 14000 is not a large data set, unless you are trying to create a dense
matrix, which will probably tax your computer, and is not necessary.

Second, you haven't indicated how you are doing this, by quoting the salient
parts of the code you are using - it may well be that your approach is
flawed, but nobody can see over your shoulder on the list. For instance, if
you are using dnearneigh() in spdep, and have set a maximum distance to
include all the observations, you will likely run out of memory (note that
the distance is in km). Just re-running a script is not a robust way to
proceed, you need to run it line by line to see where the bottleneck is. It
may be that projecting the data will solve your problem if it is the Great
Circle computations that are burdensome.

For me:

> set.seed(1)
> crds <- cbind(runif(14000, 0, 10), runif(14000, 0, 10))
> k1 <- knn2nb(knearneigh(crds, 1, longlat=TRUE))
> k1d <- nbdists(k1, crds, longlat=TRUE)
> max(unlist(k1d))
[1] 18.54627
> system.time(dnb <- dnearneigh(crds, 0, 18, longlat=TRUE))
   user  system elapsed 
 53.864   0.019  55.418 
> system.time(lw <- nb2listw(dnb, zero.policy=TRUE))
   user  system elapsed 
  0.909   0.008   0.918 
> system.time(mt <- moran.test(rnorm(14000), lw, zero.policy=TRUE))
   user  system elapsed 
  8.610   0.006   8.801 

with the R process using at most 140MB.

Third, you should consider using the R-sig-geo list, where a follow-up would
have been forthcoming more quickly.

Hope this helps,

Roger


Watmough G. wrote:
> 
> Hi
> 
> Are there any more efficient ways of calculating the neighbourhood object
> for large datasets?
> 
> I am trying to compute Moran I statistics for a very large data set (over
> 14,000 points).  I have been using moran.test from the spdep package and
> everything works fine for a small data set (200 points).  However,
> applying the same script to the whole dataset is taking days to compute
> (it so far has been going for 5 days and still no results).  This is no
> surprise due to the number of computations required.
> 
> I have found that calculating distances planar distances works much
> quicker but Great Circle distances are required.
> 
> Thanks
> 
> Gary Watmough
> 
> 


-----
Roger Bivand
Economic Geography Section
Department of Economics
Norwegian School of Economics and Business Administration
Helleveien 30
N-5045 Bergen, Norway

-- 
View this message in context: http://r.789695.n4.nabble.com/Moran-I-for-very-large-data-set-tp3063474p3065310.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list