[R] lm diagnostics and qr (fwd)

John Fox jfox at mcmail.cis.mcmaster.ca
Thu Jun 26 17:24:17 CEST 2003


Dear Jean,

On Thu, 26 Jun 2003, Jean Eid wrote:
. . .

> My other question is on the regression diagnostics particularly plotting
> Cook's distance. what is the rule to decide on outliers. If I read the
> plot correctly, the labeled distances (vertical lines) are outliers. But I
> have gotten cook's distance and compared them to qf(0, p, n-p) ( the
> median of the F distribution with paramaters p=# of variables in design,
> number of obs.-p) but does not give same answer.

I presume you mean qf(0.5, p, n-p)?

>
. . .

Except for some sense of scale, it's not sensible to treat Cook's
distances as F-values. The use of an F statistic in this context is really
just a kind of trick to obtain a scale-invariant measure of distance
between the coefficient vector for all of the data and the coefficient
vector deleting an observation. There is a rule-of-thumb cutoff for
noteworthy
Cook's distances -- 4/(n - p) -- but I wouldn't place too much stock in
it. It's better simply to look for values of Cook's D that stand out from
the others. Finaly, Cook's D isn't really an outlier diagnostic, but an
influence diagnostic. A low-leverage regression outlier, for example, can
have a small Cook's D.

I hope that this helps,
 John




More information about the R-help mailing list