[R] speeding up regressions using ddply

Alison Macalady ali at kmhome.org
Wed Sep 22 13:05:12 CEST 2010


I have a data set that I'd like to run logistic regressions on, using  
ddply to speed up the computation of many models with different  
combinations of variables.  I would like to run regressions on every  
unique two-variable combination in a portion of my data set,  but I  
can't quite figure out how to do using ddply.  The data set looks like  
this, with "status" as the binary dependent variable and V1:V8 as  
potential independent variables in the logistic regression:

m <- matrix(rnorm(288), nrow = 36)
colnames(m) <- paste('V', 1:8, sep = '')
x <- data.frame( status = factor(rep(rep(c('D','L'), each = 6), 3)),

I used melt to put my data frame into a more workable format
xm <- melt(x, id = 'status')

Here is the basic shape of the function I'd like to apply to every  
combination of variables in the dataset:

h<- function(df)

log.glm <- (glm(status ~ value1+ value2 , family=binomial(link=logit),  
na.action=na.omit)) #What I can't figure out is how to specify 2  
different variables (I've put value1 and value2 as placeholders) from  
the xm to include in the model

aic <- extractAIC(log.glm)
coef <- coef(glm.summary)
list(Est1=coef[1,2], Est2=coef[3,2],  AIC=aic[2]) #or whatever other  
output here

And then I'd like to use ddply to speed up the computations.

output<-dddply(xm, .(variable), as.data.frame.function(h))

I can easily do this using ddply when I only want to use 1 variable in  
the model, but can't figure out how to do it with two variables.

Many thanks for any hints!


Alison Macalady
Ph.D. Candidate
University of Arizona
School of Geography and Development
& Laboratory of Tree Ring Research

More information about the R-help mailing list