[R] logistic regression with constrained coefficients?

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Dec 9 00:02:30 CET 2005

```On Fri, 9 Dec 2005, Richard A. O'Keefe wrote:

> I am trying to automatically construct a distance function from
> a training set in order to use it to cluster another data set.
> The variables are nominal.  One variable is a "class" variable
> having two values; it is kept separate from the others.
>
> I have a method which constructs a distance matrix for the levels
> of a nominal variable in the context of the other variables.
>
> I want to construct a linear combination of these which gives me
> a distance between whole cases that is well associated with the
> class variable, in that
>    "combined distance between two cases large =>
>     they most likely belong to different classes."
>
> So from my training set I construct a set of
>    (d1(x1,y1), ..., dn(xn,yn), x_class != y_class)
> rows bound together as a data frame (actually I construct it by
> columns), and then the obvious thing to try was
>
>    glm(different.class ~ ., family = binomial(), data = distance.frame)
>
> The thing is that this gives me both positve and negative coefficients,
> whereas the linear combination is only guaranteed to be a metric if the
> coefficients are all non-negative.
>
> There are four fairly obvious ways to deal with that:
> (1) just force the negative coefficients to 0 and hope.
>    This turns out to work rather well, but still...
> (2) keep all the coefficients but take max(0, linear combination of distances).
>    This turns out to work rather well, but still...
> (3) Drop the variables with negative coefficients from the model,
>    refit, and iterate until no negative coefficients remain.
>    This can hardly be said to work; sometimes nearly all the variables
>    are dropped.
> (4) Use a version of glm() that will let me constrain the coefficients
>    to be non-negative.
>
> I *have* searched the R-help archives, and I see that the question about
> logistic regression with constrained coefficients has come up before, but
> it didn't really get a satisfactory answer.  I've also searched the
> documentation of more contributed packages than I could possibly understand.
>
> There is obviously some way to do this using R's general non-linear
> optimisation functions.  However, I don't know how to formulate logistic
> regression that way.

There is a worked example in MASS (the book) p.445, including adding
constraints.

--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

```