[R] Meta-analysis of prevalence at the country level with mgcv/gamm4

Wed Apr 9 19:39:30 CEST 2014

At 09:31 08/04/2014, Julien Riou wrote:
>Dear R community,

Comments in line below

>I'm working on a meta-analysis of prevalence data. The aim is to get
>estimates of prevalence at the country level. The main issue is that the
>disease is highly correlated with age, and the sample ages of included
>studies are highly heterogeneous. Only median age is available for most
>studies, so I can't use SMR-like tricks. I figured I could use
>meta-regression to solve this, including age as a fixed-effect and
>introducing study-level and country-level random-effects.
>
>The idea (that I took from Fowkes et
>al<http://www.thelancet.com/journals/lancet/article/PIIS0140-6736%2813%2961249-0/abstract>)
>was to use this model to make country-specific predictions of prevalence
>for each 5-year age group from 15 to 60 (using the median age of the
>group), and to apply these predictions to the actual population size of
>each of those groups in the selected country, in order to obtain total
>infected population and to calculate age-adjusted prevalence in the 15-60
>population from that.
>
>I tried several ways to do this using R with packages meta and mgcv. I got
>some satisfying results, but I'm not that confident with my results and
>would appreciate some feedback.
>
>First is some simulated data, then the description of my different
>approaches:
>
>data<-data.frame(id_study=c("UK1","UK2","UK3","FRA1","FRA2","BEL1","GER1","GER2","GER3"),
>
>country=c("UK","UK","UK","FRANCE","FRANCE","BELGIUM","GERMANY","GERMANY","GERMANY"),
>                  n_events=c(91,49,18,10,50,6,9,10,22),
>                  n_total=c(3041,580,252,480,887,256,400,206,300),
>                  study_median_age=c(25,50,58,30,42,26,27,28,36))

So the first UK study with a median age of 25 is going to be used to 
estimate prevalence over a range of ages? You are going to have to 
make some very strong assumptions here which I personally would not 
want to make.

Is there any possibility that in the real dataset you can fit your 
model to those studies which do provide age-specific prevalences and 
then use that to impute?

You do not say when these studies were published but I would ask the 
authors of the primary studies if they can make the information 
available to you. You may have already done that of course. I referee 
quite a few papers on systematic reviews and my impression is that 
some authors are amenable to doing the work for you. You mileage may 
vary of course.

>*Standard random-effect meta-analysis* with package meta.
>
>I used metaprop() to get a first estimate of the prevalence in each country
>without taking age into account, and to obtain weights. As expected,
>heterogeneity was very high, so I used weights from the random-effects
>model.

Which will be nearly equal and so hardly worth using in my opinion 
but again your mileage may vary.

>  meta <- 
> metaprop(event=n_events,n=n_total,byvar=country,sm="PLOGIT",method.tau="REML",data=data)
>  summary(meta)
>  data$weight<-meta$w.random
>
>I used meta to get a first estimate of the prevalence without taking age
>into account, and to obtain weights. As expected, heterogeneity was very
>high, so I used weights from the random-effects model.
>
>*Generalized additive model* to include age with package mgcv.
>
>The gam() model parameters (k and sp) were chosen using BIC and GCV number
>(not shown here).
>
>  model <- gam( cbind(n_events,n_total-n_events) ~
>s(study_median_age,bs="cr",k=4,sp=2) + s(country,bs="re"),
>weights=weight, data=data, family="binomial"(link=logit),
>method="REML")
>  plot(model,pages=1,residuals=T, all.terms=T, shade=T)
>
>Predictions for each age group were obtained from this model as explained
>earlier. CI were obtained directly using predict.gam(), that uses the
>Bayesian posterior covariance matrix of the parameters. For exemple
>considering UK:
>
>  newdat<-data.frame(country="UK",study_median_age=seq(17,57,5))
>  link<-predict(model,newdat,type="link",se.fit=T)$fit
>  linkse<-predict(model,newdat,type="link",se.fit=T)$se
>  newdat$prev<-model$family$linkinv(link)
>  newdat$CIinf<-model$family$linkinv(link-1.96*linkse)
>  newdat$CIsup<-model$family$linkinv(link+1.96*linkse)
>  plot(newdat$prev~newdat$study_median_age, type="l",ylim=c(0,.12))
>  lines(newdat$CIinf~newdat$study_median_age, lty=2)
>  lines(newdat$CIsup~newdat$study_median_age, lty=2)
>
>The results were satisfying, representing the augmentation of the
>prevalence with advanced age, with coherent confidence intervals. I
>obtained a total prevalence for the country using the country population
>structure (not shown, I hope it is clear enough).
>
>However, I figured I needed to include study-level random-effects since
>there was a high heterogeneity (even though I did not calculate
>heterogeneity after the meta-regression).
>
>*Introducing study-level random-effect* with package gamm4.
>
>Since mgcv models can't handle that much random-effect parameters, I had to
>switch to gamm4.
>
>  model2 <- gamm4(cbind(n_events,n_total-n_events) ~
>s(study_median_age,bs="cr",k=4) + s(country,bs="re"),
>random=~(1|id_study), data=data, weights=weight,
>family="binomial"(link=logit))
>  plot(model2$gam,pages=1,residuals=T, all.terms=T, shade=T)
>
>  link<-predict(model2$gam,newdat,type="link",se.fit=T)$fit
>  linkse<-predict(model2$gam,newdat,type="link",se.fit=T)$se
>  newdat$prev2<-model$family$linkinv(link)
>  newdat$CIinf2<-model$family$linkinv(link-1.96*linkse)
>  newdat$CIsup2<-model$family$linkinv(link+1.96*linkse)
>  plot(newdat$prev2~newdat$study_median_age, 
> type="l",col="red",ylim=c(0,0.11))
>  lines(newdat$CIinf2~newdat$study_median_age, lty=2,col="red")
>  lines(newdat$CIsup2~newdat$study_median_age, lty=2,col="red")
>  lines(newdat$prev~newdat$study_median_age, type="l",ylim=c(0,.12))
>  lines(newdat$CIinf~newdat$study_median_age, lty=2)
>  lines(newdat$CIsup~newdat$study_median_age, lty=2)
>
>Since the study-level random effect was in the mer part of the fit, I
>didn't have to handle it.
>
>As you can see, I obtain rather different results, with a much smoother
>relation between age and prevalence, and quite different confidence
>intervals. It is even more different in the full-data analysis, where the
>CI are much wider in the model including study-level RE, to the point it is
>sometimes almost uninformative (prevalence between 0 and 15%, but if it is
>the way it is...). Moreover, the study-level RE model seems to be more
>stable when outliers are excluded.
>
>*So, my questions are:*
>
>    - Did I properly extract the weights from the metaprop() function and
>    used them further?
>    - Did I properly built my gam() and gamm4() models? I read a lot about
>    this, but I'm not used to this king of models.
>    - Which of these models should I use?
>
>I would really appreciate some help, since neither my teachers nor my
>colleagues could. It was a really harsh to conduct the systematic review,
>and very frustrating to struggle with the analysis... Thank you in advance!

I am afraid that is the way with systematic reviews, you can only 
synthesise what you find, not what you would like to have found. 
Anyone who has done a review will sympathise with you, not that that 
is any consolation.

>Julien
>
>         [[alternative HTML version deleted]]

Michael Dewey
info at aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html