[R] glm

Bill Venables William.Venables at cmis.CSIRO.AU
Mon Nov 6 03:14:46 CET 2000

Hi Antonio,

> Hi to all,
> So I'm also a new user. I downloaded the program last week and I think =
> it's great. Thanks to those who have developed R.
> I have a special interest in GLM as a tool to describe fisheries and its =
> variables and I'm just begging to study it.

I'm in fisheries research as well.
> As I could understand there's two types of GLM sun of squares: in "type =
> I" the factors are added in sequence and "type III" this sequence =
> doesn't matter. In R I always get the message: "Terms added sequentially =
> (first to last)".
> Is there any way to make the so called Type III procedure?

I'm really curious to know why the "two types" of sum of squares are
called "Type I" and "Type III"!  This is a very common misconception,
particularly among SAS users who have been fed this nonsense quite
often for all their professional lives.  Fortunately the reality is
much simpler. There is, by any sensible reckoning, only ONE type of
sum of squares, and it always represents an improvement sum of squares
of the outer (or alternative) model over the inner (or null
hypothesis) model.  What the SAS highly dubious classification of sums
of squares does is to encourage users to concentrate on the null
hypothesis model and to forget about the alternative.  This is always
a very bad idea and not surprisingly it can lead to nonsensical tests,
as in the test it provides for main effects "even in the presence of
interactions", something which beggars definition, let alone belief.

It is quite reasonable to talk about testing strategies and to
consider sequential strategies (which yield a single analysis of
variance table) and non-sequential strategies (which logically do NOT
lead to a single analysis of variance table).  The nonsense really
starts when you see a non-sequential analysis of variance table (for a
non-orthogonal design) where the sums of squares don't add up.  This
is a clear indication that all is not as it seems.

The basic tool for investigating hypotheses in a non-sequential way is
to use drop1() on the fitted model object.  This is what you get,
effectively, in the coefficients table from a regression model,
anyway.  drop1() does not produce a table, thank goodness.  It does
allow you to look at each *non marginal* term separately, though,
which is really what you are getting at.  The fact that it does not
give anything for *marginal* terms should be seen as a very sensible
*feature*, mostly lacking in other systems....

As you might have gathered, this is something that tends to get me
rather excited.  I consider the way this material is presented in SAS
and elsewhere to be at least irresponsible and something that actively
encourages bad and rather slovenly practice.  Furthermore the way this
flawed view of the subject and the terminology that goes with it is
infiltrating the profession and its software systems is alarming and
should be resisted.  I regret to say that our commercial sister system
has already succumbed to the pressures rather arrogantly, if subtly,
exerted on it to conform (and not entirely by SAS, actually).  May R
remain ever free of this particular canker.

Bill Venables.

> Thanks!
> Antonio Olinto
> Fisheries Institute
> Sao Paulo - Brazil

Bill Venables,      Statistician,     CMIS Environmetrics Project
CSIRO Marine Labs, PO Box 120, Cleveland, Qld,  AUSTRALIA.   4163
Tel: +61 7 3826 7251           Email: Bill.Venables at cmis.csiro.au    
Fax: +61 7 3826 7304      http://www.cmis.csiro.au/bill.venables/

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list