[Rd] Enhanced version of plot.lm()

Thu Apr 28 02:54:18 CEST 2005

On 28 Apr 2005, at 1:30 AM, Martin Maechler wrote:

>>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>>>>>>     on 27 Apr 2005 16:54:02 +0200 writes:
>
>     PD> Martin Maechler <maechler at stat.math.ethz.ch> writes:
>>> I'm about to commit the current proposal(s) to R-devel,
>>> **INCLUDING** changing the default from
>>> 'which = 1:4' to 'which = c(1:3,5)
>>>
>>> and ellicit feedback starting from there.
>>>
>>> One thing I think I would like is to use color for the Cook's
>>> contours in the new 4th plot.
>
>     PD> Hmm. First try running example(plot.lm) with the modified 
> function and
>     PD> tell me which observation has the largest Cook's D. With the 
> suggested
>     PD> new 4th plot it is very hard to tell whether obs #49 is 
> potentially or
>     PD> actually influential. Plots #1 and #3 are very close to 
> conveying the
>     PD> same information though...
>
> I shouldn't be teaching here, and I know that I'm getting into fighted
> territory (regression diagnostics; robustness; "The" Truth, etc,etc)
> but I believe there is no unique way to define "actually influential"
> (hence I don't believe that it's extremely useful to know
> exactly which Cook's D is largest).
>
> Partly because there are many statistics that can be derived from a
> multiple regression fit all of which are influenced in some way.
> AFAIK, all observation-influence measures g(i) are functions of
> (r_i, h_{ii}) and the latter are the quantities that "regression
> users" should really know {without consulting a text book} and
> that are generalizable {e.g. to "linear smoothers" such as
> gam()s (for "non-estimated" smoothing parameter)}.
>
> Martin

I agree with Martin.  I like the idea of using color (red?) for
the new Cook's contours.  People who want (fairly) precise
comparisons of the Cook's statistics can still use the present
plot #4, perhaps as a follow-up to the new plot #5.
It would be possible to label the Cookwise most extreme
points with the actual values (to perhaps 2sig figures, i.e.,
labeling on both sides of such points), but this would add
what I consider is unnecessary clutter to the graph.

John.

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.