[R] Moran I for very large data set

Tue Nov 30 14:53:46 CET 2010

----------------------------------------
> Date: Tue, 30 Nov 2010 04:51:10 -0800
> From: Roger.Bivand at nhh.no
> To: r-help at r-project.org
> Subject: Re: [R] Moran I for very large data set
>
>
> First, 14000 is not a large data set, unless you are trying to create a dense
> matrix, which will probably tax your computer, and is not necessary.
>
> Second, you haven't indicated how you are doing this, by quoting the salient
> parts of the code you are using - it may well be that your approach is
> flawed, but nobody can see over your shoulder on the list. For instance, if
> you are using dnearneigh() in spdep, and have set a maximum distance to
> include all the observations, you will likely run out of memory (note that
> the distance is in km). Just re-running a script is not a robust way to
> proceed, you need to run it line by line to see where the bottleneck is. It
> may be that projecting the data will solve your problem if it is the Great
> Circle computations that are burdensome.

I guess I'd add to the posting guidelines that if you are reporting a performance
issue, make a token effort to determine and post what resource is really
limiting your peformance ( CPU, page faults, IO etc). either that or fedex
us your machine... Oh 'dohs, task manager reported CPU usage will drop to
almost zero as it blocks for IO( page faults are IO) and on 'dohs 7
they seem to have greatly expanded task manager although you don't seem
to be able to reduce it to text for easy sharing. 

>
> For me:
>
> > set.seed(1)
> > crds <- cbind(runif(14000, 0, 10), runif(14000, 0, 10))
> > k1 <- knn2nb(knearneigh(crds, 1, longlat=TRUE))
> > k1d <- nbdists(k1, crds, longlat=TRUE)
> > max(unlist(k1d))
> [1] 18.54627
> > system.time(dnb <- dnearneigh(crds, 0, 18, longlat=TRUE))
> user system elapsed
> 53.864 0.019 55.418
> > system.time(lw <- nb2listw(dnb, zero.policy=TRUE))
> user system elapsed
> 0.909 0.008 0.918
> > system.time(mt <- moran.test(rnorm(14000), lw, zero.policy=TRUE))
> user system elapsed
> 8.610 0.006 8.801
>
> with the R process using at most 140MB.
>
> Third, you should consider using the R-sig-geo list, where a follow-up would
> have been forthcoming more quickly.
>
> Hope this helps,
>
> Roger
>
>
> Watmough G. wrote:
> >
> > Hi
> >
> > Are there any more efficient ways of calculating the neighbourhood object
> > for large datasets?
> >
> > I am trying to compute Moran I statistics for a very large data set (over
> > 14,000 points). I have been using moran.test from the spdep package and
> > everything works fine for a small data set (200 points). However,
> > applying the same script to the whole dataset is taking days to compute
> > (it so far has been going for 5 days and still no results). This is no
> > surprise due to the number of computations required.
> >
> > I have found that calculating distances planar distances works much
> > quicker but Great Circle distances are required.
> >
> > Thanks
> >
> > Gary Watmough
> >
> >
>
>
> -----
> Roger Bivand
> Economic Geography Section
> Department of Economics
> Norwegian School of Economics and Business Administration
> Helleveien 30
> N-5045 Bergen, Norway
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Moran-I-for-very-large-data-set-tp3063474p3065310.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.