[R] Building and scoring multiple models with dplyr functions

Tue Aug 19 03:09:02 CEST 2014

At the risk of being old-fashioned, I suggest doing this in a
for-loop. Why struggle to fit this into the dplyr framework when a
straight-forward loop will do the trick?

This is untested in the absence of example data, but something along
the lines of

models <- list()
predictions <- list()
for(g in unique(model_data$group)) {
    models[[g]] <- glm(cbind(y,n) ~ var1 + var2 +
                           var3 + var4 + var5 +
                           var6 + var7 + var8 +
                           var9 + var10,
                       family = binomial,
                       data = subset(model_data, group == g)
                       )
    predictions[[g]] <- predict(models[[g]],
                                newdata = subset(new_data, group == g))
}

should do it.

Best,
Ista

On Mon, Aug 18, 2014 at 3:58 PM, Andrew Agrimson <jagrimsasl at gmail.com> wrote:
> Hello All,
>
> I have a question regarding building multiple models and then scoring new
> data with these models. I have been tasked with converting existing SAS
> modeling code into equivalent R code, but unfortunately I rarely use R so
> I'm in unfamiliar territory. I've think I've found a good way to build the
> models using dplyr functions, however I'm having difficulty figuring out
> how to use the models to score new data.
>
>
> *The SAS code I'm converting builds multiple binomial models using the "BY"
> statement in the GLIMMIX procedure. The results of the modeling fitting
> process are stored using the "STORE" statement. *
>
> *proc* *glimmix* data = model_data;
>
> by group;
>
> model y/n = var1-var10/dist=bin;
>
> store model;
>
> *quit*;
>
>
> *The next step is to score a new data set using the PLM procedure. The "New
> Data" is also grouped by "group" and PLM is able to match and apply the
> appropriate model with the appropriate "by" value. *
>
> *proc* *plm* restore=model;
>
> score data=new_data out=scored predicted=p/ilink;
>
> *run*;
>
>
> *In R I've been able to reproduce the first model building step using dplyr
> functions and it seems to work quite well. In fact it's much faster than my
> SAS implementation.*
>
> by_group <- group_by(model_data, group)
>
> models <- by_group %>% do(mod = glm(cbind(y,n) ~ var1 + var2 + var3 + var4
> + var5 + var6 + var7 + var8 + var9 + var10,
>                                 family = binomial, data = .))
>
>
> *As stated above, I cannot figure out how to apply these models to new
> data.  I've scoured the internet and the documentation for an example but
> so far no luck. I want to extract the model objects out of the data frame
> "models" and apply the "predict" function, but my novice knowledge of R and
> dplyr specifically is making this very difficult.*
>
> *Any help or advice would be greatly appreciated.*
>
>
> *Thanks,*
>
> *Andy*
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.