[R] package mgcv - predict with bam: Error in X[ind, ] : subscript out of bounds

Simon Wood s.wood at bath.ac.uk
Mon Feb 3 12:42:23 CET 2014


Hi Katharina,

Thanks for sending this.

The problem is that the prediction data for site contain levels not
available in the (useable non-NA) fit data...

 > levels(m$model$site)
[1] "KRB"     "NP.FOR"  "WKS.FRE" "WKS.KRE" "WKS.RIE" "WKS.WUE"
 > levels(gapData$site)
[1] "KRB"     "NP.FOR"  "RIE.2"   "WKS.BBR" "WKS.FRE" "WKS.HOE" "WKS.KRE"
[8] "WKS.RIE" "WKS.WUE"

predict.lm has a check for this, and so fails with a rather more 
informative error message. e.g.

m0 <- lm(sensor1 ~ sensor2 + site + site:NthSampling, 
data=xylemRohWeekXnn2011,na.action=na.omit)
predict(m0,gapData)
... factor site has new levels RIE.2, WKS.BBR

I'll add a better check to predict.gam.

best,
Simon

ps. if you want predictions with the random effects for site set to zero 
then one trick is to use terms like s(site,bs="re",by=dum) in fitting 
with dum set to 1. Then in prediction you can set 'site' to any existing 
level, and dum to zero, in order to get a prediction for the missing 
level, with the 'site' effect set to zero.


On 02/02/14 17:52, Katharina May wrote:
> Hi Simon,
>
> thank you for your reply, I really appreciate any help to understand
> the problem here...
> Unluckily the package upgrade didn't help with this issue.
> An example reproducing the error, and a current sessionInfo() Output
> can be found below.
>
> Many thanks once again,
>
>         Katharina
>
>
> R Code Example
> <snip>
>   library(RCurl)
>   library(mgcv)
>   #retrieve xylemRohWeekXnn2011 test data frame
>   eval( expr =         parse( text =
> getURL("https://webdisk.ads.mwn.de/Handlers/AnonymousDownload.ashx?folder=1a7cbaa4&path=xylemRohWeekXnn2011.R")
> ))
>
>   xylemRohWeekXnn.fit.bam  <- bam(sensor1 ~ sensor2 + s(site, bs="re")
> + s(site, NthSampling, bs="re") ,  data=xylemRohWeekXnn2011,
> na.action=na.omit)
>
>   #subset data containing gaps for predicting
>   gapData <- xylemRohWeekXnn2011[is.na(xylemRohWeekXnn2011[,2]) &
> !is.na(xylemRohWeekXnn2011[,11]),c(2:3,6:7, 11)]
>
>   xylemRohWeekXnnSite.fit <-
> predict.gam(xylemRohWeekXnn.fit.bam,gapData, type="response", se=F)
> </snap>
>
>
>
> My current Session Information (sessionInfo() Output - also confirming
> that the problem exists on both Windows and Mac OS X):
> <snip>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] mgcv_1.7-28    nlme_3.1-113   RCurl_1.95-4.1 bitops_1.0-6
>
> loaded via a namespace (and not attached):
> [1] grid_3.0.2      lattice_0.20-24 Matrix_1.1-2    tools_3.0.2
> </snap>
>
>
>
>
> On 31/01/14 12:57, Simon Wood wrote:
>>
>> Hi Katharina,
>>
>> Could you try upgrading to mgcv_1.7-28, please? There was an occasional
>> problem to do with matching factor levels, which is fixed, but I'm not
>> very confident that is what is going on.
>>
>> If upgrading doesn't work, is there any chance you could send me a small
>> example dataset and code that produces the error, and I'll look at it?
>>
>> best,
>> Simon
>>
>> --
>> Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
>> +44 (0)1225 386603               http://people.bath.ac.uk/sw283


-- 
Simon Wood, Mathematical Science, University of Bath BA2 7AY UK
+44 (0)1225 386603               http://people.bath.ac.uk/sw283




More information about the R-help mailing list