[R] High concurvity/ collinearity between time and temperature in GAM predicting deaths but low ACF. Does this matter?

j@de@shod@@ m@iii@g oii googiem@ii@com j@de@shod@@ m@iii@g oii googiem@ii@com
Fri Jun 10 11:30:39 CEST 2022


Hi Simon,

I don't think my reply to your email came through to the list, so am
resending (see below). Problems with subscription have now hopefully
been resolved. Apologies if this is a double posting!

On Thu, 9 Jun 2022 at 16:57, jade.shodan using googlemail.com
<jade.shodan using googlemail.com> wrote:
>
> Hi Simon,
>
> (Sorry, replies and answers are out of sync due to my problems posting
> to the list/ messages being held for moderation)
>
> > Did it actually fail, or simply generate warnings?
> The model was computed (you're right, not a model failure), but
> resulted in warnings (as per previous post) which, frankly, I didn't
> understand.
>
> > if I understand your model right there is one free parameter for each observation falling on the 15th
> That's right, one observation on each 15th day of the month.
>
> Thank you for the suggestion about the random effects! I had been
> wondering about how I could model this heaping with a smooth!
>
> Quick question:
>
> You proposed:
>
> aheap <- heap!="0"
> heap + s(heap,bs="re",by=heap) ## fixed mean effect of being a heap
> day + a r.e. for each heap day - no effect on non-heap days
>
> Is there a typo in the code above? I don't see the newly created
> variable aheap in the model?
> Should it read "by = aheap" as follows:
>
> heap + s(heap,bs="re",by=aheap)   ?
>
> So a full model might then look like the one below?
>
>  bam0 <- bam(deaths~te(year, month, week, weekday,
> bs=c("cr","cc","cc","cc"), k = c(4,5,5,5)) + heap +
> s(heap,bs="re",by=aheap)
>                                      te(temp_max, lag, k=c(8, 3)) +
>                                      te(precip_daily_total, lag, k=c(8, 3)),
>                                      data = dat, family = nb, method =
> 'fREML', select = TRUE, discrete = TRUE,
>                                    knots = list(month = c(0.5, 12.5),
> week = c(0.5, 52.5), weekday = c(0, 6.5)))
>
> One, hopefully final (!) question:
>
> Is it actually useful at all to keep these observations on the 15th
> day of each month (which are huge errors), or am I better off removing
> them from the data set (or replacing them with e.g. median values)?
> For temperature-mortality modelling it is the day-to-day variation in
> deaths and temperature that is of interest. So is modelling heaping
> actually useful at all, given that this variable changes on a monthly
> basis? (I think you are alluding to this, but I just want to make
> sure).
>
> If I take them out altogether, would I be best off removing all data
> for these dates, so that the time series jumps from day 14 to day 16?
> Or would this create problems with e.g. the distributed lag model?
>
> Sorry for all these questions! Have been struggling with this for
> months (posted on Cross Validated about the heaping issue too), and
> feel I am finally getting somewhere with your help!
>
> Jade
>
> On Thu, 9 Jun 2022 at 15:24, Simon Wood <simon.wood using bath.edu> wrote:
> >
> >
> > On 09/06/2022 12:30, jade.shodan using googlemail.com wrote:
> > > Hi Simon,
> > >
> > > Thanks so much for this!! (Apologies if this is a double posting. I
> > > seem to have a problem getting messages through to the list).
> > >
> > > I have two follow up questions, if you don't mind.
> > >
> > > 1. Does including an autoregressive term not adjust away part of the
> > > effect of the response in a distributed lag model (where the outcome
> > > accumulates over time)?
> >
> > - the hope is that it approximately deals with short timescale stuff
> > without interfering with the longer timescales to the same extent as a
> > high rank smooth would.
> >
> > > 2. I've tried to fit a model using bam (just a first attempt without
> > > AR term), but including the factor variable heap creates errors:
> > >
> > > bam0 <- bam(deaths~te(year, month, week, weekday,
> > > bs=c("cr","cc","cc","cc"), k = c(4,5,5,5)) + heap +
> > >                                       te(temp_max, lag, k=c(8, 3)) +
> > >                                       te(precip_daily_total, lag, k=c(8, 3)),
> > >                                      data = dat, family = nb, method =
> > > 'fREML', select = TRUE, discrete = TRUE,
> > >                                      knots = list(month = c(0.5, 12.5),
> > > week = c(0.5, 52.5), weekday = c(0, 6.5)))
> > >
> > > This model results in errors:
> > >
> > > Warning in estimate.theta(theta, family, y, mu, scale = scale1, wt = G$w,  :
> > >    step failure in theta estimation
> > > Warning in sqrt(family$dev.resids(object$y, object$fitted.values,
> > > object$prior.weights)) :
> > >    NaNs produced
> > >
> > Did it actually fail, or simply generate warnings?
> >
> > 'bam' handles factors, but if I understand your model right there is one
> > free parameter for each observation falling on the 15th, so that you
> > will fit data for those days exactly (and might just as well have
> > dropped them, for all the information they contribute to the rest of the
> > model). If you want a structure like this, I'd be inclined to make the
> > heap variable random, something like...
> >
> > aheap <- heap!="0"
> >
> > heap + s(heap,bs="re",by=heap) ## fixed mean effect of being a heap day
> > + a r.e. for each heap day - no effect on non-heap days
> >
> > best,
> >
> > Simon
> >
> > > Including heap as as.numeric(heap) runs the model without error
> > > messages or warnings, but model diagnostics look terrible, and it also
> > > doesn't make sense (to me) to make heap a numeric. The factor variable
> > > heap (with 169 levels) codes the fact that all deaths for which no
> > > date was known, were registered on the 15th day of each month. I've
> > > coded all non-heaping days as 0. All heaping days were coded as a
> > > value between 1-168. The time series spans 14 years, so a heaping day
> > > in each month results in 14*12 levels = 168, plus one level for
> > > non-heaping days.
> > >
> > > So my second question is: Does bam allow factor variables? And if not,
> > > how should I model this heaping on the 15th day of the month instead?
> > >
> > > With thanks,
> > >
> > > Jade
> > >
> > > On Wed, 8 Jun 2022 at 12:05, Simon Wood <simon.wood using bath.edu> wrote:
> > >> I would not worry too much about high concurvity between variables like
> > >> temperature and time. This just reflects the fact that temperature has a
> > >> strong temporal pattern.
> > >>
> > >> I would also not be too worried about the low p-values on the k check.
> > >> The check only looks for pattern in the residuals when they are ordered
> > >> with respect to the variables of a smooth. When you have time series
> > >> data and some smooths involve time then it's hard not to pick up some
> > >> degree of residual auto-correlation, which you often would not want to
> > >> model with a higher rank smoother.
> > >>
> > >> The NAs  for the distributed lag terms just reflect the fact that there
> > >> is no obvious way to order the residuals w.r.t. the covariates for such
> > >> terms, so the simple check for residual pattern is not really possible.
> > >>
> > >> One simple approach is to fit using bam(...,discrete=TRUE) which will
> > >> let you specify an AR1 parameter to mop up some of the residual
> > >> auto-correlation without resorting to a high rank smooth that then does
> > >> all the work of the covariates as well. The AR1 parameter can be set by
> > >> looking at the ACF of the residuals of the model without this. You need
> > >> to look at the ACF of suitably standardized residuals to check how well
> > >> this has worked.
> > >>
> > >> best,
> > >>
> > >> Simon
> > >>
> > >> On 05/06/2022 20:01, jade.shodan--- via R-help wrote:
> > >>> Hello everyone,
> > >>>
> > >>> A few days ago I asked a question about concurvity in a GAM (the
> > >>> anologue of collinearity in a GLM) implemented in mgcv. I think my
> > >>> question was a bit unfocussed, so I am retrying again, but with
> > >>> additional information included about the autocorrelation function. I
> > >>> have also posted about this on Cross Validated. Given all the model
> > >>> output, it might make for easier
> > >>> reading:https://stats.stackexchange.com/questions/577790/high-concurvity-collinearity-between-time-and-temperature-in-gam-predicting-dea
> > >>>
> > >>> As mentioned previously, I have problems with concurvity in my thesis
> > >>> research, and don't have access to a statistician who works with time
> > >>> series, GAMs or R. I'd be very grateful for any (partial) answer,
> > >>> however short. I'll gladly return the favour where I can! For really
> > >>> helpful input I'd be more than happy to offer co-authorship on
> > >>> publication. Deadlines are very close, and I'm heading towards having
> > >>> no results at all if I can't solve this concurvity issue :(
> > >>>
> > >>> I'm using GAMs to try to understand the relationship between deaths
> > >>> and heat-related variables (e.g. temperature and humidity), using
> > >>> daily time series over a 14-year period from a tropical, low-income
> > >>> country. My aim is to understand the relationship between these
> > >>> variables and deaths, rather than pure prediction performance.
> > >>>
> > >>> The GAMs include distributed lag models (set up as 7-column matrices,
> > >>> see code at bottom of post), since deaths may occur over several days
> > >>> following exposure.
> > >>>
> > >>> Simple GAMs with just time, lagged temperature and lagged
> > >>> precipitation (a potential confounder) show very high concurvity
> > >>> between lagged temperature and time, regardless of the many different
> > >>> ways I have tried to decompose time. The autocorrelation functions
> > >>> (ACF) however, shows values close to zero, only just about breaching
> > >>> the 'significance line' in a few instances. It does show patterning
> > >>> though, although the regularity is difficult to define.
> > >>>
> > >>> My questions are:
> > >>> 1) Should I be worried about the high concurvity, or can I ignore it
> > >>> given the mostly non-significant ACF? I've read dozens of
> > >>> heat-mortality modelling studies and none report on concurvity between
> > >>> weather variables and time (though one 2012 paper discussed
> > >>> autocorrelation).
> > >>>
> > >>> 2) If I cannot ignore it, what should I do to resolve it? Would
> > >>> including an autoregressive term be appropriate, and if so, where can
> > >>> I find a coded example of how to do this? I've also come across
> > >>> sequential regression][1]. Would this be more or less appropriate? If
> > >>> appropriate, a pointer to an example would be really appreciated!
> > >>>
> > >>> Some example GAMs are specified as follows:
> > >>> ```r
> > >>> conc38b <- gam(deaths~te(year, month, week, weekday,
> > >>> bs=c("cr","cc","cc","cc")) + heap +
> > >>>                         te(temp_max, lag, k=c(10, 3)) +
> > >>>                         te(precip_daily_total, lag, k=c(10, 3)),
> > >>>                         data = dat, family = nb, method = 'REML', select = TRUE,
> > >>>                         knots = list(month = c(0.5, 12.5), week = c(0.5,
> > >>> 52.5), weekday = c(0, 6.5)))
> > >>> ```
> > >>> Concurvity for the above model between (temp_max, lag) and (year,
> > >>> month, week, weekday) is 0.91:
> > >>>
> > >>> ```r
> > >>> $worst
> > >>>                                       para te(year,month,week,weekday)
> > >>> te(temp_max,lag) te(precip_daily_total,lag)
> > >>> para                        1.000000e+00                1.125625e-29
> > >>>        0.3150073                  0.6666348
> > >>> te(year,month,week,weekday) 1.400648e-29                1.000000e+00
> > >>>        0.9060552                  0.6652313
> > >>> te(temp_max,lag)            3.152795e-01                8.998113e-01
> > >>>        1.0000000                  0.5781015
> > >>> te(precip_daily_total,lag)  6.666348e-01                6.652313e-01
> > >>>        0.5805159                  1.0000000
> > >>> ```
> > >>>
> > >>> Output from ```gam.check()```:
> > >>> ```r
> > >>> Method: REML   Optimizer: outer newton
> > >>> full convergence after 16 iterations.
> > >>> Gradient range [-0.01467332,0.003096643]
> > >>> (score 8915.994 & scale 1).
> > >>> Hessian positive definite, eigenvalue range [5.048053e-05,26.50175].
> > >>> Model rank =  544 / 544
> > >>>
> > >>> Basis dimension (k) checking results. Low p-value (k-index<1) may
> > >>> indicate that k is too low, especially if edf is close to k'.
> > >>>
> > >>>                                     k'      edf k-index p-value
> > >>> te(year,month,week,weekday) 319.0000  26.6531    0.96    0.06 .
> > >>> te(temp_max,lag)             29.0000   3.3681      NA      NA
> > >>> te(precip_daily_total,lag)   27.0000   0.0051      NA      NA
> > >>> ---
> > >>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > >>> ```
> > >>>
> > >>> Some output from ```summary(conc38b)```:
> > >>> ```r
> > >>> Approximate significance of smooth terms:
> > >>>                                     edf Ref.df  Chi.sq p-value
> > >>> te(year,month,week,weekday) 26.653127    319 166.803 < 2e-16 ***
> > >>> te(temp_max,lag)             3.368129     27  11.130 0.00145 **
> > >>> te(precip_daily_total,lag)   0.005104     27   0.002 0.69636
> > >>> ---
> > >>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > >>>
> > >>> R-sq.(adj) =  0.839   Deviance explained = 53.3%
> > >>> -REML =   8916  Scale est. = 1         n = 5107
> > >>> ```
> > >>>
> > >>>
> > >>> Below are the ACF plots (note limit y-axis = 0.1 for clarity of
> > >>> pattern). They show peaks at 5 and 15, and then there seems to be a
> > >>> recurring pattern at multiples of approx. 30 (suggesting month is not
> > >>> modelled adequately?). Not sure what would cause the spikes at 5 and
> > >>> 15. There is heaping of deaths on the 15th day of each month, to which
> > >>> deaths with unknown date were allocated. This heaping was modelled
> > >>> with categorical variable/ factor ```heap``` with 169 levels (0 for
> > >>> all non-heaping days and 1-168 (i.e. 14 * 12 for each subsequent
> > >>> heaping day over the 14-year period):
> > >>>
> > >>>     [2]: https://i.stack.imgur.com/FzKyM.png
> > >>>     [3]: https://i.stack.imgur.com/fE3aL.png
> > >>>
> > >>>
> > >>> I get an identical looking ACF when I decompose time into (year,
> > >>> month, monthday) as in model conc39 below, although concurvity between
> > >>> (temp_max, lag) and the time term has now dropped somewhat to 0.83:
> > >>>
> > >>> ```r
> > >>> conc39 <- gam(deaths~te(year, month, monthday, bs=c("cr","cc","cr")) + heap +
> > >>>                        te(temp_max, lag, k=c(10, 4)) +
> > >>>                        te(precip_daily_total, lag, k=c(10, 4)),
> > >>>                        data = dat, family = nb, method = 'REML', select = TRUE,
> > >>>                        knots = list(month = c(0.5, 12.5)))
> > >>> ```
> > >>> ```r
> > >>>
> > >>> Method: REML   Optimizer: outer newton
> > >>> full convergence after 14 iterations.
> > >>> Gradient range [-0.001578187,6.155096e-05]
> > >>> (score 8915.763 & scale 1).
> > >>> Hessian positive definite, eigenvalue range [1.894391e-05,26.99215].
> > >>> Model rank =  323 / 323
> > >>>
> > >>> Basis dimension (k) checking results. Low p-value (k-index<1) may
> > >>> indicate that k is too low, especially if edf is close to k'.
> > >>>
> > >>>                                   k'     edf k-index p-value
> > >>> te(year,month,monthday)    79.0000 25.0437    0.93  <2e-16 ***
> > >>> te(temp_max,lag)           39.0000  4.0875      NA      NA
> > >>> te(precip_daily_total,lag) 36.0000  0.0107      NA      NA
> > >>> ---
> > >>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > >>> ```
> > >>> Some output from ```summary(conc39)```:
> > >>> ```r
> > >>> Approximate significance of smooth terms:
> > >>>                                   edf Ref.df  Chi.sq  p-value
> > >>> te(year,month,monthday)    38.75573     99 187.231  < 2e-16 ***
> > >>> te(temp_max,lag)            4.06437     37  25.927 1.66e-06 ***
> > >>> te(precip_daily_total,lag)  0.01173     36   0.008    0.557
> > >>> ---
> > >>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > >>>
> > >>> R-sq.(adj) =  0.839   Deviance explained = 53.8%
> > >>> -REML =   8915  Scale est. = 1         n = 5107
> > >>> ```
> > >>>
> > >>>
> > >>> ```r
> > >>> $worst
> > >>>                                      para te(year,month,monthday)
> > >>> te(temp_max,lag) te(precip_daily_total,lag)
> > >>> para                       1.000000e+00            3.261007e-31
> > >>> 0.3313549                  0.6666532
> > >>> te(year,month,monthday)    3.060763e-31            1.000000e+00
> > >>> 0.8266086                  0.5670777
> > >>> te(temp_max,lag)           3.331014e-01            8.225942e-01
> > >>> 1.0000000                  0.5840875
> > >>> te(precip_daily_total,lag) 6.666532e-01            5.670777e-01
> > >>> 0.5939380                  1.0000000
> > >>> ```
> > >>>
> > >>> Modelling time as ```te(year, doy)``` with a cyclic spline for doy and
> > >>> various choices for k or as ```s(time)``` with various k does not
> > >>> reduce concurvity either.
> > >>>
> > >>>
> > >>> The default approach in time series studies of heat-mortality is to
> > >>> model time with fixed df, generally between 7-10 df per year of data.
> > >>> I am, however, apprehensive about this approach because a) mortality
> > >>> profiles vary with locality due to sociodemographic and environmental
> > >>> characteristics and b) the choice of df is based on higher income
> > >>> countries (where nearly all these studies have been done) with
> > >>> different mortality profiles and so may not be appropriate for
> > >>> tropical, low-income countries.
> > >>>
> > >>> Although the approach of fixing (high) df does remove more temporal
> > >>> patterns from the ACF (see model and output below), concurvity between
> > >>> time and lagged temperature has now risen to 0.99! Moreover,
> > >>> temperature (which has been a consistent, highly significant predictor
> > >>> in every model of the tens (hundreds?) I have run, has now turned
> > >>> non-significant. I am guessing this is because time is now a very
> > >>> wiggly function that not only models/ removes seasonal variation, but
> > >>> also some of the day-to-day variation that is needed for the
> > >>> temperature smooth  :
> > >>>
> > >>> ```r
> > >>> conc20a <- gam(deaths~s(time, k=112, fx=TRUE) + heap +
> > >>>                         te(temp_max, lag, k=c(10,3)) +
> > >>>                         te(precip_daily_total, lag, k=c(10,3)),
> > >>>                         data = dat, family = nb, method = 'REML', select = TRUE)
> > >>> ```
> > >>> Output from ```gam.check(conc20a, rep = 1000)```:
> > >>>
> > >>> ```r
> > >>> Method: REML   Optimizer: outer newton
> > >>> full convergence after 9 iterations.
> > >>> Gradient range [-0.0008983099,9.546022e-05]
> > >>> (score 8750.13 & scale 1).
> > >>> Hessian positive definite, eigenvalue range [0.0001420112,15.40832].
> > >>> Model rank =  336 / 336
> > >>>
> > >>> Basis dimension (k) checking results. Low p-value (k-index<1) may
> > >>> indicate that k is too low, especially if edf is close to k'.
> > >>>
> > >>>                                    k'      edf k-index p-value
> > >>> s(time)                    111.0000 111.0000    0.98    0.56
> > >>> te(temp_max,lag)            29.0000   0.6548      NA      NA
> > >>> te(precip_daily_total,lag)  27.0000   0.0046      NA      NA
> > >>> ```
> > >>> Output from ```concurvity(conc20a, full=FALSE)$worst```:
> > >>>
> > >>> ```r
> > >>>                                      para      s(time) te(temp_max,lag)
> > >>> te(precip_daily_total,lag)
> > >>> para                       1.000000e+00 2.462064e-19        0.3165236
> > >>>                   0.6666348
> > >>> s(time)                    2.462398e-19 1.000000e+00        0.9930674
> > >>>                   0.6879284
> > >>> te(temp_max,lag)           3.170844e-01 9.356384e-01        1.0000000
> > >>>                   0.5788711
> > >>> te(precip_daily_total,lag) 6.666348e-01 6.879284e-01        0.5788381
> > >>>                   1.0000000
> > >>>
> > >>> ```
> > >>>
> > >>> Some output from ```summary(conc20a)```:
> > >>> ```r
> > >>> Approximate significance of smooth terms:
> > >>>                                    edf Ref.df  Chi.sq p-value
> > >>> s(time)                    1.110e+02    111 419.375  <2e-16 ***
> > >>> te(temp_max,lag)           6.548e-01     27   0.895   0.249
> > >>> te(precip_daily_total,lag) 4.598e-03     27   0.002   0.868
> > >>> ---
> > >>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> > >>>
> > >>> R-sq.(adj) =  0.843   Deviance explained = 56.1%
> > >>> -REML = 8750.1  Scale est. = 1         n = 5107
> > >>> ```
> > >>>
> > >>> ACF functions:
> > >>>
> > >>> [4]: https://i.stack.imgur.com/7nbXS.png
> > >>> [5]: https://i.stack.imgur.com/pNnZU.png
> > >>>
> > >>> Data can be found on my [GitHub][6] site in the file
> > >>> [data_cross_validated_post2.rds][7]. A csv version is also available.
> > >>> This is my code:
> > >>>
> > >>> ```r
> > >>> library(readr)
> > >>> library(mgcv)
> > >>>
> > >>> df <- read_rds("data_crossvalidated_post2.rds")
> > >>>
> > >>> # Create matrices for lagged weather variables (6 day lags) based on
> > >>> example by Simon Wood
> > >>> # in his 2017 book ("Generalized additive models: an introduction with
> > >>> R", p. 349) and
> > >>> # gamair package documentation
> > >>> (https://cran.r-project.org/web/packages/gamair/gamair.pdf, p. 54)
> > >>>
> > >>> lagard <- function(x,n.lag=7) {
> > >>> n <- length(x); X <- matrix(NA,n,n.lag)
> > >>> for (i in 1:n.lag) X[i:n,i] <- x[i:n-i+1]
> > >>> X
> > >>> }
> > >>>
> > >>> dat <- list(lag=matrix(0:6,nrow(df),7,byrow=TRUE),
> > >>> deaths=df$deaths_total,doy=df$doy, year = df$year, month = df$month,
> > >>> weekday = df$weekday, week = df$week, monthday = df$monthday, time =
> > >>> df$time, heap=df$heap, heap_bin = df$heap_bin, precip_hourly_dailysum
> > >>> = df$precip_hourly_dailysum)
> > >>> dat$temp_max <- lagard(df$temp_max)
> > >>> dat$temp_min <- lagard(df$temp_min)
> > >>> dat$temp_mean <- lagard(df$temp_mean)
> > >>> dat$wbgt_max <- lagard(df$wbgt_max)
> > >>> dat$wbgt_mean <- lagard(df$wbgt_mean)
> > >>> dat$wbgt_min <- lagard(df$wbgt_min)
> > >>> dat$temp_wb_nasa_max <- lagard(df$temp_wb_nasa_max)
> > >>> dat$sh_mean <- lagard(df$sh_mean)
> > >>> dat$solar_mean <- lagard(df$solar_mean)
> > >>> dat$wind2m_mean <- lagard(df$wind2m_mean)
> > >>> dat$sh_max <- lagard(df$sh_max)
> > >>> dat$solar_max <- lagard(df$solar_max)
> > >>> dat$wind2m_max <- lagard(df$wind2m_max)
> > >>> dat$temp_wb_nasa_mean <- lagard(df$temp_wb_nasa_mean)
> > >>> dat$precip_hourly_dailysum <- lagard(df$precip_hourly_dailysum)
> > >>> dat$precip_hourly <- lagard(df$precip_hourly)
> > >>> dat$precip_daily_total <- lagard( df$precip_daily_total)
> > >>> dat$temp <- lagard(df$temp)
> > >>> dat$sh <- lagard(df$sh)
> > >>> dat$rh <- lagard(df$rh)
> > >>> dat$solar <- lagard(df$solar)
> > >>> dat$wind2m <- lagard(df$wind2m)
> > >>>
> > >>>
> > >>> conc38b <- gam(deaths~te(year, month, week, weekday,
> > >>> bs=c("cr","cc","cc","cc")) + heap +
> > >>>                         te(temp_max, lag, k=c(10, 3)) +
> > >>>                         te(precip_daily_total, lag, k=c(10, 3)),
> > >>>                         data = dat, family = nb, method = 'REML', select = TRUE,
> > >>>                         knots = list(month = c(0.5, 12.5), week = c(0.5,
> > >>> 52.5), weekday = c(0, 6.5)))
> > >>>
> > >>> conc39 <- gam(deaths~te(year, month, monthday, bs=c("cr","cc","cr")) + heap +
> > >>>                        te(temp_max, lag, k=c(10, 4)) +
> > >>>                        te(precip_daily_total, lag, k=c(10, 4)),
> > >>>                        data = dat, family = nb, method = 'REML', select = TRUE,
> > >>>                        knots = list(month = c(0.5, 12.5)))
> > >>>
> > >>> conc20a <- gam(deaths~s(time, k=112, fx=TRUE) + heap +
> > >>>                         te(temp_max, lag, k=c(10,3)) +
> > >>>                         te(precip_daily_total, lag, k=c(10,3)),
> > >>>                         data = dat, family = nb, method = 'REML', select = TRUE)
> > >>>
> > >>> ```
> > >>> Thank you if you've read this far!! :-))
> > >>>
> > >>>     [1]: https://scholar.google.co.uk/scholar?output=instlink&q=info:PKdjq7ZwozEJ:scholar.google.com/&hl=en&as_sdt=0,5&scillfp=17865929886710916120&oi=lle
> > >>>     [2]: https://i.stack.imgur.com/FzKyM.png
> > >>>     [3]: https://i.stack.imgur.com/fE3aL.png
> > >>>     [4]: https://i.stack.imgur.com/7nbXS.png
> > >>>     [5]: https://i.stack.imgur.com/pNnZU.png
> > >>>     [6]: https://github.com/JadeShodan/heat-mortality
> > >>>     [7]: https://github.com/JadeShodan/heat-mortality/blob/main/data_cross_validated_post2.rds
> > >>>
> > >>> ______________________________________________
> > >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > >>> and provide commented, minimal, self-contained, reproducible code.
> > >> --
> > >> Simon Wood, School of Mathematics, University of Edinburgh,
> > >> https://www.maths.ed.ac.uk/~swood34/
> > >>
> > --
> > Simon Wood, School of Mathematics, University of Edinburgh,
> > https://www.maths.ed.ac.uk/~swood34/
> >



More information about the R-help mailing list