[R] dummy encoding in metafor

Mon Jan 21 11:10:09 CET 2013

As Michael already mentioned, the error:

Error in qr.solve(wX, diag(k)) : singular matrix 'a' in solve

indeed indicates that your design matrix is not of full rank (i.e., there are linear dependencies among your predictors). With this many factors in the same model, this is not surprising if k is "only" 94 (which is actually quite large for a meta-analysis). One options is to leave out some of the predictors. You can also try collapsing some of the levels of the factors. Of course, you lose some "details" that way, but apparently you don't have enough data in the first place to carry out such a detailed analysis.

One other thing I noticed. You wrote:

rma(yi=Mean, vi=Variance, ni=N.1, ...)

I suspect that your variable "Variance" is actually the variance of the raw scores. However, the vi argument is used to pass the sampling variances of the yi values to the function -- not the variance of raw scores. The (estimated) sampling variance of a mean is s^2 / n, so if I am not mistaken, you really want to use:

rma(yi=Mean, vi=Variance/N.1, ...)

Best,
Wolfgang

--   
Wolfgang Viechtbauer, Ph.D., Statistician   
Department of Psychiatry and Psychology   
School for Mental Health and Neuroscience   
Faculty of Health, Medicine, and Life Sciences   
Maastricht University, P.O. Box 616 (VIJV1)   
6200 MD Maastricht, The Netherlands   
+31 (43) 388-4170 | http://www.wvbauer.com   

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
> On Behalf Of Michael Dewey
> Sent: Monday, January 21, 2013 10:40
> To: Alma Wilflinger; Michael Dewey; r-help at r-project.org
> Subject: Re: [R] dummy encoding in metafor
> 
> At 14:48 20/01/2013, Alma Wilflinger wrote:
> >Hi,
> >
> >thank you very much for your kind answer.
> >
> > >If you look a bit further down the manual page you will see
> > >### using a model formula to specify the same model
> > >rma(yi, vi, mods=~factor(alloc)+year+ablat, data=dat, method="REML",
> > >btt=c(2,3))
> >
> > >which is much easier.
> >
> >I have seen the possibility of using a model formula for dummy
> >encoding and you are right it is much easier than doing it by hand.
> >Thing is that if I include some moderator variables into the
> >parameters I get the error:
> >
> >Error in qr.solve(wX, diag(k)) : singular matrix 'a' in solve
> 
> I suspect that you have a linear dependence between your moderator
> variables. Depending on how many levels there are for country,
> sample, and so on you do have a lot of predictors (you presumably
> know that a factor counts as levels-1 for this purpose?)
> 
> 
> >For example this call works:
> >result = rma(yi=Mean, vi=Variance, ni=N.1, mods=~factor(Country) +
> >relevel(factor(Sample), ref="Students") + Gender + Age +
> >factor(Category) + relevel(factor(Block), ref="c")+
> >relevel(factor(order), ref="x"), data=csvDataCmaAll, method="REML")
> >
> >If I add the trials which is of type INT:
> >result = rma(yi=Mean, vi=Variance, ni=N.1, mods=~factor(Country) +
> >relevel(factor(Sample), ref="Students") + Gender + Age +
> >factor(Category) + relevel(factor(Block), ref="c")+
> >relevel(factor(order), ref="x") + trials, data=csvDataCmaAll,
> method="REML")
> >
> >I get the error and I was not able to find a definite reason for
> >this error or how to solve it I wanted to try it by doing it manually.
> >I think I have found out that it somehow relates to the
> >
> > >If you code them yourself R does not know. You know.
> >
> >Regarding this I think my question was not clear enough. If R does
> >the dummy encoding automatically via a model formula it leaves out
> >one of the factors and uses it as a baseline automatically. If I do
> >it by hand R is still able to execute the function but the baseline
> >is missing because I do not define it via a parameter.
> 
> You perhaps would benefit from rereading some of the introductory
> material about formulas. Also look for anything about the model
> matrix (also called the design matrix)
> 
> >I simply want to know how R is handling this and what I have to do
> >by hand to get the correct results. Sorry, this may be a beginners
> >question, but as stated I am new to this field.
> >
> > >You say you have seven moderator variables. Unless you have a shed
> > >load of studies you will not be able to look at them simultaneously.
> > >Apologies if you already knew that.
> >
> >No I have not known that. In total I have about 94 studies and want
> >to test different sets of moderators. Do you think this is
> >sufficient or do you suggest another approach?
> 
> The truthful but perhaps unhelpful answer is that you need to collect
> more data or use fewer moderators.
> 
> >I started in CMA (comprehensive meta analysis) but one of the
> >benefits of R is that I am able to test multiple moderators at once
> >- at least as I was told.
> >
> >kind regards,
> >Alma
> >
> >
> >From: Michael Dewey <info at aghmed.fsnet.co.uk>
> >To: Alma Wilflinger <alma_anima at yahoo.com>; "r-help at r-project.org"
> ><r-help at r-project.org>
> >Sent: Sunday, January 20, 2013 12:52 PM
> >Subject: Re: [R] dummy encoding in metafor
> >
> >At 17:14 19/01/2013, Alma Wilflinger wrote:
> > >Hi,
> > >
> > >I am quite new to R and in need of some advice. I am trying to
> > >conduct a meta regression over a some studies with about 7 mod
> > >variables which I have to dummy encode.
> >
> >Alma, although you can generate your own dummy variables by hand you
> >do not have to as R will do it for you. See below for more comments.
> >
> >
> > >I have found the following piece of code in the manual for the
> > >metafor library:
> > >
> > >### manual dummy coding of the allocation factor
> > >alloc.random <- ifelse(dat$alloc == "random", 1, 0)
> > >alloc.alternate <- ifelse(dat$alloc == "alternate", 1, 0)
> > >alloc.systematic <- ifelse(dat$alloc == "systematic", 1, 0)
> >
> >If you look a bit further down the manual page you will see
> >### using a model formula to specify the same model
> >rma(yi, vi, mods=~factor(alloc)+year+ablat, data=dat, method="REML",
> >btt=c(2,3))
> >
> >which is much easier.
> >
> > >### test the allocation factor (in the presence of the other
> moderators)
> > >### note: "alternate" is the reference level of the allocation factor
> > >### note: the intercept is the first coefficient, so btt=c(2,3)
> > >rma(yi, vi, mods=cbind(alloc.random, alloc.systematic, year, ablat),
> > >data=dat, method="REML", btt=c(2,3))
> > >
> > >What I do not understand is the following:
> > >How does R know which columns in my data.frame are related to the
> > >dummy encoded variables?
> >
> >If you code them yourself R does not know. You know.
> >
> >
> > >It is clear that in the call of cbind I just do not use the
> > >reference variable as a parameter but I do not get it how R knows
> > >that alloc.random and alloc.systematic refer to the column alloc in
> > >the data frame.
> > >
> > >Thank you very much in advance for your help,
> > >
> >
> >You say you have seven moderator variables. Unless you have a shed
> >load of studies you will not be able to look at them simultaneously.
> >Apologies if you already knew that.
> >
> > >kind regards,
> > >Alma
> > >        [[alternative HTML version deleted]]
> >
> >Michael Dewey
> ><mailto:info at aghmed.fsnet.co.uk>info at aghmed.fsnet.co.uk
> >http://www.aghmed.fsnet.co.uk/home.html
> >
> >
> 
> Michael Dewey
> info at aghmed.fsnet.co.uk
> http://www.aghmed.fsnet.co.uk/home.html
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.