[R] glmmPQL and spatial correlation

Ben Bolker bbolker at gmail.com
Thu Oct 11 04:33:09 CEST 2012


Luis Huckstadt <lahuckst <at> ucsc.edu> writes:

> I'm running into some computer issues when trying to run a binomial model
> for spatially correlated data using glmmPQL and was wondering if anyone
> could help me out.
> My whole dataset consists of about 300,000 points for which I have a suite
> of environmental variables (I'm trying to come up with a habitat model for
> a species of seal, using real (presence) and simulated dives (absence) as
> my response variable).
> Since my dataset is so large, I split it into thirds and ran the model
> without spatial correlation. However, when checking my results, I did get a
> spatial correlation in the residuals, which I'm trying to incorporate using
> a variogram (spherical). The problem is that  when I run it  for my 1/3 of
> my data (about 70k points), the calculatations go on forever. I was running
> the first model for over a week and was still not getting past the first
> iteration.
> 
> This is the model I used
> 
> M1.f.Spa <- glmmPQL(presence ~ sst + tmax100 + tbot + sss + ssu + tmax100d
> + sbot, random = ~1|fPTT, family = binomial, correlation =
> corSpher(c(91323.53,0.4279603), form =~ x+y, nugget = TRUE), data =
> sample.df1)
> 
> This week, I tried a different approach and split the 70k subset into
> smaller datasets of 10,000 points, and now the model runs much faster (just
> a couple of hours tops), Yet the output of these models changes with
> regards to the significance of one variable, which makes me think that I
> need a larger dataset than 10,000 points.

  That's a bit surprising.  When you say the significance of one variable
changes -- does that mean it changes from (e.g.) 0.045 to 0.055, which
crosses the magic line at p=0.05 but is not otherwise meaningful?
> 
> Does someone have a suggestion on how to improve/run this code with the
> spatial correlation for a larger dataset than 10,000 which wouldn't take
> weeks to run?
> 
> I'm working with an Intel Core i7, 12 Gb RAM (plus a couple of 100 Gbs in
> virtual memory), in Windows  7 64-bit.

  This is not going to be easy in any case.  I would look into the
INLA package and AD Model Builder ... (I would also be mildly nervous
about using PQL for binary data -- it might be OK, but this is known
to be more or less the worst case scenario for PQL ...)

  Ben Bolker




More information about the R-help mailing list