[R] Building and scoring multiple models with dplyr functions

Andrew Agrimson jagrimsasl at gmail.com
Mon Aug 18 21:58:57 CEST 2014

Hello All,

I have a question regarding building multiple models and then scoring new
data with these models. I have been tasked with converting existing SAS
modeling code into equivalent R code, but unfortunately I rarely use R so
I'm in unfamiliar territory. I've think I've found a good way to build the
models using dplyr functions, however I'm having difficulty figuring out
how to use the models to score new data.

*The SAS code I'm converting builds multiple binomial models using the "BY"
statement in the GLIMMIX procedure. The results of the modeling fitting
process are stored using the "STORE" statement. *

*proc* *glimmix* data = model_data;

by group;

model y/n = var1-var10/dist=bin;

store model;


*The next step is to score a new data set using the PLM procedure. The "New
Data" is also grouped by "group" and PLM is able to match and apply the
appropriate model with the appropriate "by" value. *

*proc* *plm* restore=model;

score data=new_data out=scored predicted=p/ilink;


*In R I've been able to reproduce the first model building step using dplyr
functions and it seems to work quite well. In fact it's much faster than my
SAS implementation.*

by_group <- group_by(model_data, group)

models <- by_group %>% do(mod = glm(cbind(y,n) ~ var1 + var2 + var3 + var4
+ var5 + var6 + var7 + var8 + var9 + var10,
                                family = binomial, data = .))

*As stated above, I cannot figure out how to apply these models to new
data.  I've scoured the internet and the documentation for an example but
so far no luck. I want to extract the model objects out of the data frame
"models" and apply the "predict" function, but my novice knowledge of R and
dplyr specifically is making this very difficult.*

*Any help or advice would be greatly appreciated.*



	[[alternative HTML version deleted]]

More information about the R-help mailing list