# [R] logit and polytomous data

Prof Brian Ripley ripley at stats.ox.ac.uk
Fri Mar 10 13:15:42 CET 2000

```> Date: Fri, 10 Mar 2000 12:18:14 +0200 (EET)
> From: Kari Ruohonen <toimelat at saunalahti.fi>
>
> I am new to generalized linear models and studying
> McCullagh & Nelder (1989). Especially, I have a problem
> resembling the \"cheese taste\" example (5.3.1. p. 109) of
> the book.

You confused me. It is on page 109 of McCullagh & Nelder (1983), the
first edition. That example is not using a glm: see later.

> I tried to analyse the cheese example with R but
> failed to do so because R allowed me to use logit link
> function only with binary family that supposes 0 <= y <= 1.
> Do I need to scale the y\'s or is there another way?

You want the *binomial* family.  From Venables & Ripley
(1999, pp. 218-9)

1. If the response is a numeric vector it is assumed to hold
the data in ratio form, \$y_i = s_i/a_i\$, in which case the
\$a_i\$s must be given as a vector of weights using the
\sfn{weights} argument.  (If the \$a_i\$ are all one the default
\sfn{weights} suffices.)

2. If the response is a logical vector or a two-level factor
it is treated as a 0/1 numeric vector and handled as previously.

If the response is a multi-level factor, the first level is treated
as 0 (failure) and all others as 1 (success).

3. If the response is a two-column matrix it is assumed that the
first column holds the number of successes, \$s_i\$, and the second
holds the number of failures, \$a_i - s_i\$, for each trial.  In
this case no \sfn{weights} argument is required.

The less-intuitive third form allows the fitting function to select a
better starting value, so we tend to favour it.

This is using the standard notation for a GLM family: for a binomial
s_i is the number of successes out of a_i trial.

HOWEVER, the analysis in McCullagh and Nelder (1983) is via an ordinal
logistic regression.

Here's an R analysis:

library(MASS)
counts <- scan()
0  0  1  7  8  8 19  8  1
6  9 12 11  7  6  1  0  0
1  1  6  8 23  7  5  1  0
0  0  0  1  3  7 14 16 11

resp <- ordered(gl(9, 1, 36))
cheese <- gl(4, 9, 36, labels=LETTERS[1:4])
cheese <- relevel(cheese, "D")

fit <- polr(resp ~ cheese, weights=counts, Hess=TRUE)
fit

Call:
polr(formula = resp ~ cheese, weights = counts, Hess=TRUE)

Coefficients:
cheeseA   cheeseB   cheeseC
-1.612792 -4.964633 -3.322695

Intercepts:
1|2        2|3        3|4        4|5        5|6        6|7        7|8
-7.0801140 -6.0249573 -4.9253989 -3.8568048 -2.5205479 -1.5685582 -0.0669085
8|9
1.4929198

Residual Deviance: 711.35
AIC: 733.35

summary(fit)
...
Coefficients:
Value Std. Error    t value
cheeseA -1.612792  0.3805424  -4.238139
cheeseB -4.964633  0.4767185 -10.414182
cheeseC -3.322695  0.4218289  -7.876879
...
vcov(fit)
...

(Summary on polr should be able to give the correlations, but the presnet
version has a small bug.)

--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

```