[Rd] informal conventions/checklist for new predictive modeling packages

Thu Jan 5 21:44:41 CET 2012

I agree with almost all, except the last point. Since I have
participated in wheel-reinvention lately, I agree with the bulk of
your comment. I don't think the fix is as easy as you suspect,
RSiteSearch won't help me find a function I need when I don't know the
magic words.  Some R functions have such unexpected names that only a
fastidious source-code reader would find them ("pretty", for example).
 But I agree with your concern.

But, as far as the last one is concerned, I think you are mistaken.
Explanation below.

On Wed, Jan 4, 2012 at 8:19 AM, Max Kuhn <mxkuhn at gmail.com> wrote:
>
> (14) [OCD] For binary classification models, model the probability of
> the first level of a factor as the event of interest (again, for
> consistency) Note that glm() does not do this but most others use the
> first level.
>
When the DV is thought of as 0 and 1, and 1 is an "event" "success" or
"win" and 0 is a "non event" "failure" or "loss",  if there is to be a
single predicted probability, I want it to be the probability of the
higher outcome.

glm is doing the thing I want, and I don't know of others that go the
other way, except PROC LOGISTIC in SAS.  And that has a long history
of causing confusion and despair.

I'd like to consider adding one thing to your list, though.  I have
wished (in this list and elsewhere) that there were a more regular
approach for calculating "newdata" objects that are used in predict.
Many packages have re-invented this (datadist in rms, effects), and
almost nobody here agreed with my wish for a more standard approach.
But if there were a standard approach, it would be much easier to hold
up R as an alternative to Stata when users pop up with "marginal
effects tables" from Stata that are very difficult to reproduce with
R.

Regards,
pj

-- 
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas