[Rd] informal conventions/checklist for new predictive modeling packages
pauljohn32 at gmail.com
Thu Jan 5 21:44:41 CET 2012
I agree with almost all, except the last point. Since I have
participated in wheel-reinvention lately, I agree with the bulk of
your comment. I don't think the fix is as easy as you suspect,
RSiteSearch won't help me find a function I need when I don't know the
magic words. Some R functions have such unexpected names that only a
fastidious source-code reader would find them ("pretty", for example).
But I agree with your concern.
But, as far as the last one is concerned, I think you are mistaken.
On Wed, Jan 4, 2012 at 8:19 AM, Max Kuhn <mxkuhn at gmail.com> wrote:
> (14) [OCD] For binary classification models, model the probability of
> the first level of a factor as the event of interest (again, for
> consistency) Note that glm() does not do this but most others use the
> first level.
When the DV is thought of as 0 and 1, and 1 is an "event" "success" or
"win" and 0 is a "non event" "failure" or "loss", if there is to be a
single predicted probability, I want it to be the probability of the
glm is doing the thing I want, and I don't know of others that go the
other way, except PROC LOGISTIC in SAS. And that has a long history
of causing confusion and despair.
I'd like to consider adding one thing to your list, though. I have
wished (in this list and elsewhere) that there were a more regular
approach for calculating "newdata" objects that are used in predict.
Many packages have re-invented this (datadist in rms, effects), and
almost nobody here agreed with my wish for a more standard approach.
But if there were a standard approach, it would be much easier to hold
up R as an alternative to Stata when users pop up with "marginal
effects tables" from Stata that are very difficult to reproduce with
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
More information about the R-devel