[R] dummy encoding in metafor

Mon Jan 21 10:40:01 CET 2013

At 14:48 20/01/2013, Alma Wilflinger wrote:
>Hi,
>
>thank you very much for your kind answer.
>
> >If you look a bit further down the manual page you will see
> >### using a model formula to specify the same model
> >rma(yi, vi, mods=~factor(alloc)+year+ablat, data=dat, method="REML",
> >btt=c(2,3))
>
> >which is much easier.
>
>I have seen the possibility of using a model formula for dummy 
>encoding and you are right it is much easier than doing it by hand.
>Thing is that if I include some moderator variables into the 
>parameters I get the error:
>
>Error in qr.solve(wX, diag(k)) : singular matrix 'a' in solve

I suspect that you have a linear dependence between your moderator 
variables. Depending on how many levels there are for country, 
sample, and so on you do have a lot of predictors (you presumably 
know that a factor counts as levels-1 for this purpose?)

>For example this call works:
>result = rma(yi=Mean, vi=Variance, ni=N.1, mods=~factor(Country) + 
>relevel(factor(Sample), ref="Students") + Gender + Age + 
>factor(Category) + relevel(factor(Block), ref="c")+ 
>relevel(factor(order), ref="x"), data=csvDataCmaAll, method="REML")
>
>If I add the trials which is of type INT:
>result = rma(yi=Mean, vi=Variance, ni=N.1, mods=~factor(Country) + 
>relevel(factor(Sample), ref="Students") + Gender + Age + 
>factor(Category) + relevel(factor(Block), ref="c")+ 
>relevel(factor(order), ref="x") + trials, data=csvDataCmaAll, method="REML")
>
>I get the error and I was not able to find a definite reason for 
>this error or how to solve it I wanted to try it by doing it manually.
>I think I have found out that it somehow relates to the
>
> >If you code them yourself R does not know. You know.
>
>Regarding this I think my question was not clear enough. If R does 
>the dummy encoding automatically via a model formula it leaves out 
>one of the factors and uses it as a baseline automatically. If I do 
>it by hand R is still able to execute the function but the baseline 
>is missing because I do not define it via a parameter.

You perhaps would benefit from rereading some of the introductory 
material about formulas. Also look for anything about the model 
matrix (also called the design matrix)

>I simply want to know how R is handling this and what I have to do 
>by hand to get the correct results. Sorry, this may be a beginners 
>question, but as stated I am new to this field.
>
> >You say you have seven moderator variables. Unless you have a shed
> >load of studies you will not be able to look at them simultaneously.
> >Apologies if you already knew that.
>
>No I have not known that. In total I have about 94 studies and want 
>to test different sets of moderators. Do you think this is 
>sufficient or do you suggest another approach?

The truthful but perhaps unhelpful answer is that you need to collect 
more data or use fewer moderators.

>I started in CMA (comprehensive meta analysis) but one of the 
>benefits of R is that I am able to test multiple moderators at once 
>- at least as I was told.
>
>kind regards,
>Alma
>
>
>From: Michael Dewey <info at aghmed.fsnet.co.uk>
>To: Alma Wilflinger <alma_anima at yahoo.com>; "r-help at r-project.org" 
><r-help at r-project.org>
>Sent: Sunday, January 20, 2013 12:52 PM
>Subject: Re: [R] dummy encoding in metafor
>
>At 17:14 19/01/2013, Alma Wilflinger wrote:
> >Hi,
> >
> >I am quite new to R and in need of some advice. I am trying to
> >conduct a meta regression over a some studies with about 7 mod
> >variables which I have to dummy encode.
>
>Alma, although you can generate your own dummy variables by hand you
>do not have to as R will do it for you. See below for more comments.
>
>
> >I have found the following piece of code in the manual for the
> >metafor library:
> >
> >### manual dummy coding of the allocation factor
> >alloc.random <- ifelse(dat$alloc == "random", 1, 0)
> >alloc.alternate <- ifelse(dat$alloc == "alternate", 1, 0)
> >alloc.systematic <- ifelse(dat$alloc == "systematic", 1, 0)
>
>If you look a bit further down the manual page you will see
>### using a model formula to specify the same model
>rma(yi, vi, mods=~factor(alloc)+year+ablat, data=dat, method="REML",
>btt=c(2,3))
>
>which is much easier.
>
> >### test the allocation factor (in the presence of the other moderators)
> >### note: "alternate" is the reference level of the allocation factor
> >### note: the intercept is the first coefficient, so btt=c(2,3)
> >rma(yi, vi, mods=cbind(alloc.random, alloc.systematic, year, ablat),
> >data=dat, method="REML", btt=c(2,3))
> >
> >What I do not understand is the following:
> >How does R know which columns in my data.frame are related to the
> >dummy encoded variables?
>
>If you code them yourself R does not know. You know.
>
>
> >It is clear that in the call of cbind I just do not use the
> >reference variable as a parameter but I do not get it how R knows
> >that alloc.random and alloc.systematic refer to the column alloc in
> >the data frame.
> >
> >Thank you very much in advance for your help,
> >
>
>You say you have seven moderator variables. Unless you have a shed
>load of studies you will not be able to look at them simultaneously.
>Apologies if you already knew that.
>
> >kind regards,
> >Alma
> >        [[alternative HTML version deleted]]
>
>Michael Dewey
><mailto:info at aghmed.fsnet.co.uk>info at aghmed.fsnet.co.uk
>http://www.aghmed.fsnet.co.uk/home.html
>
>

Michael Dewey
info at aghmed.fsnet.co.uk
http://www.aghmed.fsnet.co.uk/home.html