[R] speeding up regressions using ddply

Alison Macalady ali at kmhome.org
Wed Sep 22 13:05:12 CEST 2010



Hi,

I have a data set that I'd like to run logistic regressions on, using  
ddply to speed up the computation of many models with different  
combinations of variables.  I would like to run regressions on every  
unique two-variable combination in a portion of my data set,  but I  
can't quite figure out how to do using ddply.  The data set looks like  
this, with "status" as the binary dependent variable and V1:V8 as  
potential independent variables in the logistic regression:

m <- matrix(rnorm(288), nrow = 36)
colnames(m) <- paste('V', 1:8, sep = '')
x <- data.frame( status = factor(rep(rep(c('D','L'), each = 6), 3)),
                as.data.frame(m))

I used melt to put my data frame into a more workable format
require(reshape)
xm <- melt(x, id = 'status')

Here is the basic shape of the function I'd like to apply to every  
combination of variables in the dataset:

h<- function(df)
{

attach(df)
log.glm <- (glm(status ~ value1+ value2 , family=binomial(link=logit),  
na.action=na.omit)) #What I can't figure out is how to specify 2  
different variables (I've put value1 and value2 as placeholders) from  
the xm to include in the model

glm.summary<-summary(log.glm)
aic <- extractAIC(log.glm)
coef <- coef(glm.summary)
list(Est1=coef[1,2], Est2=coef[3,2],  AIC=aic[2]) #or whatever other  
output here
}

And then I'd like to use ddply to speed up the computations.

require(pplyr)
output<-dddply(xm, .(variable), as.data.frame.function(h))
output


I can easily do this using ddply when I only want to use 1 variable in  
the model, but can't figure out how to do it with two variables.

Many thanks for any hints!

Ali



--------------------
Alison Macalady
Ph.D. Candidate
University of Arizona
School of Geography and Development
& Laboratory of Tree Ring Research



More information about the R-help mailing list