[R] dummy encoding in metafor

Wed Jan 23 10:22:58 CET 2013

At 08:30 23/01/2013, Alma Wilflinger wrote:
>Dear Wolfgang and Michael,
>
>thank you very much for your help!
>
>Concerning the Variance: I took the variance I used for CMA (which 
>is always 1), so I think it should be the right one.

It seems unlikely to me that the variance from each study would be 
the same although I suppose it could be possible. Are you sure you 
are supplying the right values to CMA?

>Thank you for noticing and mentioning though :)
>
>I really appreciate how helpful you both are.
>
>best,
>Alma
>
>
>
>From: Viechtbauer Wolfgang (STAT) 
><wolfgang.viechtbauer at maastrichtuniversity.nl>
>To: Michael Dewey <info at aghmed.fsnet.co.uk>; Alma Wilflinger 
><alma_anima at yahoo.com>; "r-help at r-project.org" <r-help at r-project.org>
>Sent: Monday, January 21, 2013 11:10 AM
>Subject: RE: [R] dummy encoding in metafor
>
>As Michael already mentioned, the error:
>
>Error in qr.solve(wX, diag(k)) : singular matrix 'a' in solve
>
>indeed indicates that your design matrix is not of full rank (i.e., 
>there are linear dependencies among your predictors). With this many 
>factors in the same model, this is not surprising if k is "only" 94 
>(which is actually quite large for a meta-analysis). One options is 
>to leave out some of the predictors. You can also try collapsing 
>some of the levels of the factors. Of course, you lose some 
>"details" that way, but apparently you don't have enough data in the 
>first place to carry out such a detailed analysis.
>
>One other thing I noticed. You wrote:
>
>rma(yi=Mean, vi=Variance, ni=N.1, ...)
>
>I suspect that your variable "Variance" is actually the variance of 
>the raw scores. However, the vi argument is used to pass the 
>sampling variances of the yi values to the function -- not the 
>variance of raw scores. The (estimated) sampling variance of a mean 
>is s^2 / n, so if I am not mistaken, you really want to use:
>
>rma(yi=Mean, vi=Variance/N.1, ...)
>
>Best,
>Wolfgang
>
>--
>Wolfgang Viechtbauer, Ph.D., Statistician
>Department of Psychiatry and Psychology
>School for Mental Health and Neuroscience
>Faculty of Health, Medicine, and Life Sciences
>Maastricht University, P.O. Box 616 (VIJV1)
>6200 MD Maastricht, The Netherlands
>+31 (43) 388-4170 | http://www.wvbauer.com
>
> > -----Original Message-----
> > From: 
> <mailto:r-help-bounces at r-project.org>r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org]
> > On Behalf Of Michael Dewey
> > Sent: Monday, January 21, 2013 10:40
> > To: Alma Wilflinger; Michael Dewey; 
> <mailto:r-help at r-project.org>r-help at r-project.org
> > Subject: Re: [R] dummy encoding in metafor
> >
> > At 14:48 20/01/2013, Alma Wilflinger wrote:
> > >Hi,
> > >
> > >thank you very much for your kind answer.
> > >
> > > >If you look a bit further down the manual page you will see
> > > >### using a model formula to specify the same model
> > > >rma(yi, vi, mods=~factor(alloc)+year+ablat, data=dat, method="REML",
> > > >btt=c(2,3))
> > >
> > > >which is much easier.
> > >
> > >I have seen the possibility of using a model formula for dummy
> > >encoding and you are right it is much easier than doing it by hand.
> > >Thing is that if I include some moderator variables into the
> > >parameters I get the error:
> > >
> > >Error in qr.solve(wX, diag(k)) : singular matrix 'a' in solve
> >
> > I suspect that you have a linear dependence between your moderator
> > variables. Depending on how many levels there are for country,
> > sample, and so on you do have a lot of predictors (you presumably
> > know that a factor counts as levels-1 for this purpose?)
> >
> >
> > >For example this call works:
> > >result = rma(yi=Mean, vi=Variance, ni=N.1, mods=~factor(Country) +
> > >relevel(factor(Sample), ref="Students") + Gender + Age +
> > >factor(Category) + relevel(factor(Block), ref="c")+
> > >relevel(factor(order), ref="x"), data=csvDataCmaAll, method="REML")
> > >
> > >If I add the trials which is of type INT:
> > >result = rma(yi=Mean, vi=Variance, ni=N.1, mods=~factor(Country) +
> > >relevel(factor(Sample), ref="Students") + Gender + Age +
> > >factor(Category) + relevel(factor(Block), ref="c")+
> > >relevel(factor(order), ref="x") + trials, data=csvDataCmaAll,
> > method="REML")
> > >
> > >I get the error and I was not able to find a definite reason for
> > >this error or how to solve it I wanted to try it by doing it manually.
> > >I think I have found out that it somehow relates to the
> > >
> > > >If you code them yourself R does not know. You know.
> > >
> > >Regarding this I think my question was not clear enough. If R does
> > >the dummy encoding automatically via a model formula it leaves out
> > >one of the factors and uses it as a baseline automatically. If I do
> > >it by hand R is still able to execute the function but the baseline
> > >is missing because I do not define it via a parameter.
> >
> > You perhaps would benefit from rereading some of the introductory
> > material about formulas. Also look for anything about the model
> > matrix (also called the design matrix)
> >
> > >I simply want to know how R is handling this and what I have to do
> > >by hand to get the correct results. Sorry, this may be a beginners
> > >question, but as stated I am new to this field.
> > >
> > > >You say you have seven moderator variables. Unless you have a shed
> > > >load of studies you will not be able to look at them simultaneously.
> > > >Apologies if you already knew that.
> > >
> > >No I have not known that. In total I have about 94 studies and want
> > >to test different sets of moderators. Do you think this is
> > >sufficient or do you suggest another approach?
> >
> > The truthful but perhaps unhelpful answer is that you need to collect
> > more data or use fewer moderators.
> >
> > >I started in CMA (comprehensive meta analysis) but one of the
> > >benefits of R is that I am able to test multiple moderators at once
> > >- at least as I was told.
> > >
> > >kind regards,
> > >Alma
> > >
> > >
> > >From: Michael Dewey 
> <<mailto:info at aghmed.fsnet.co.uk>info at aghmed.fsnet.co.uk>
> > >To: Alma Wilflinger 
> <<mailto:alma_anima at yahoo.com>alma_anima at yahoo.com>; 
> "<mailto:r-help at r-project.org>r-help at r-project.org"
> > ><<mailto:r-help at r-project.org>r-help at r-project.org>
> > >Sent: Sunday, January 20, 2013 12:52 PM
> > >Subject: Re: [R] dummy encoding in metafor
> > >
> > >At 17:14 19/01/2013, Alma Wilflinger wrote:
> > > >Hi,
> > > >
> > > >I am quite new to R and in need of some advice. I am trying to
> > > >conduct a meta regression over a some studies with about 7 mod
> > > >variables which I have to dummy encode.
> > >
> > >Alma, although you can generate your own dummy variables by hand you
> > >do not have to as R will do it for you. See below for more comments.
> > >
> > >
> > > >I have found the following piece of code in the manual for the
> > > >metafor library:
> > > >
> > > >### manual dummy coding of the allocation factor
> > > >alloc.random <- ifelse(dat$alloc == "random", 1, 0)
> > > >alloc.alternate <- ifelse(dat$alloc == "alternate", 1, 0)
> > > >alloc.systematic <- ifelse(dat$alloc == "systematic", 1, 0)
> > >
> > >If you look a bit further down the manual page you will see
> > >### using a model formula to specify the same model
> > >rma(yi, vi, mods=~factor(alloc)+year+ablat, data=dat, method="REML",
> > >btt=c(2,3))
> > >
> > >which is much easier.
> > >
> > > >### test the allocation factor (in the presence of the other
> > moderators)
> > > >### note: "alternate" is the reference level of the allocation factor
> > > >### note: the intercept is the first coefficient, so btt=c(2,3)
> > > >rma(yi, vi, mods=cbind(alloc.random, alloc.systematic, year, ablat),
> > > >data=dat, method="REML", btt=c(2,3))
> > > >
> > > >What I do not understand is the following:
> > > >How does R know which columns in my data.frame are related to the
> > > >dummy encoded variables?
> > >
> > >If you code them yourself R does not know. You know.
> > >
> > >
> > > >It is clear that in the call of cbind I just do not use the
> > > >reference variable as a parameter but I do not get it how R knows
> > > >that alloc.random and alloc.systematic refer to the column alloc in
> > > >the data frame.
> > > >
> > > >Thank you very much in advance for your help,
> > > >
> > >
> > >You say you have seven moderator variables. Unless you have a shed
> > >load of studies you will not be able to look at them simultaneously.
> > >Apologies if you already knew that.
> > >
> > > >kind regards,
> > > >Alma
> > > >        [[alternative HTML version deleted]]
> > >
> > >Michael Dewey
> > ><mailto:info at aghmed.fsnet.co.uk><mailto:info at aghmed.fsnet.co.uk>i 
> nfo at aghmed.fsnet.co.uk
> > >http://www.aghmed.fsnet.co.uk/home.html
> > >
> > >
> >
> > Michael Dewey
> > <mailto:info at aghmed.fsnet.co.uk>info at aghmed.fsnet.co.uk
> > http://www.aghmed.fsnet.co.uk/home.html
> >
> > ______________________________________________
> > <mailto:R-help at r-project.org>R-help at r-project.org mailing list
> > 
> <https://stat.ethz.ch/mailman/listinfo/r-help>https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> <http://www.r-project.org/posting->http://www.R-project.org/posting-
> > guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

Michael Dewey
info at aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html