[Rd] Enhanced version of plot.lm()

Fri Apr 29 01:46:59 CEST 2005

NB also the mention of a possible addition to stats: vif()

Dear John -
I think users can cope with six plots offered by one function,
with four of them given by default, and the two remaining
plots alternative ways of presenting the information in the
final default plot.  The idea of plot.lm() was to provide a
set of plots that would serve most basic purposes.

It may be reasonable to have a suite of plots for
examining residuals and influence.  I'd suggest
trying to follow the syntax and labeling conventions
as for plot.lm(), unless these seem inappropriate.

While on such matters, there is a function vif() in DAAG,
and a more comprehensive function vif() in car.  One of
these, probably yours if you are willing, should go into
stats.  There's one addition that I'd make; allow a model
matrix as parameter, as an optional alternative to giving
the model object.
Regards
John M.

On 28 Apr 2005, at 10:39 PM, John Fox wrote:

> Dear John et al.,
>
> Curiously, Georges Monette (at York University in Toronto) and I were 
> just
> talking last week about influence-statistic contours, and I wrote a 
> couple
> of functions to show these for Cook's D and for covratio as functions 
> of
> hat-values and studentized residuals. These differ a bit from the ones
> previously discussed here in that they show rule-of-thumb cut-offs for 
> D and
> covratio, along with Bonferroni critical values for studentized 
> residuals.
>
> I've attached a file with these functions, even though they're not that
> polished.
>
> More generally, I wonder whether it's not best to supply plots like 
> these as
> separate functions rather than as a do-it-all plot method for lm 
> objects.
>
> Regards,
>  John
>
> --------------------------------
> John Fox
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox
> --------------------------------
>
>> -----Original Message-----
>> From: r-devel-bounces at stat.math.ethz.ch
>> [mailto:r-devel-bounces at stat.math.ethz.ch] On Behalf Of John
>> Maindonald
>> Sent: Wednesday, April 27, 2005 7:54 PM
>> To: Martin Maechler
>> Cc: David Firth; Werner Stahel; r-devel at stat.math.ethz.ch;
>> Peter Dalgaard
>> Subject: Re: [Rd] Enhanced version of plot.lm()
>>
>>
>> On 28 Apr 2005, at 1:30 AM, Martin Maechler wrote:
>>
>>>>>>>> "PD" == Peter Dalgaard <p.dalgaard at biostat.ku.dk>
>>>>>>>>     on 27 Apr 2005 16:54:02 +0200 writes:
>>>
>>>     PD> Martin Maechler <maechler at stat.math.ethz.ch> writes:
>>>>> I'm about to commit the current proposal(s) to R-devel,
>>>>> **INCLUDING** changing the default from 'which = 1:4' to 'which =
>>>>> c(1:3,5)
>>>>>
>>>>> and ellicit feedback starting from there.
>>>>>
>>>>> One thing I think I would like is to use color for the Cook's
>>>>> contours in the new 4th plot.
>>>
>>>     PD> Hmm. First try running example(plot.lm) with the modified
>>> function and
>>>     PD> tell me which observation has the largest Cook's D.
>> With the
>>> suggested
>>>     PD> new 4th plot it is very hard to tell whether obs #49 is
>>> potentially or
>>>     PD> actually influential. Plots #1 and #3 are very close to
>>> conveying the
>>>     PD> same information though...
>>>
>>> I shouldn't be teaching here, and I know that I'm getting
>> into fighted
>>> territory (regression diagnostics; robustness; "The" Truth,
>> etc,etc)
>>> but I believe there is no unique way to define "actually
>> influential"
>>> (hence I don't believe that it's extremely useful to know exactly
>>> which Cook's D is largest).
>>>
>>> Partly because there are many statistics that can be derived from a
>>> multiple regression fit all of which are influenced in some way.
>>> AFAIK, all observation-influence measures g(i) are
>> functions of (r_i,
>>> h_{ii}) and the latter are the quantities that "regression users"
>>> should really know {without consulting a text book} and that are
>>> generalizable {e.g. to "linear smoothers" such as gam()s (for
>>> "non-estimated" smoothing parameter)}.
>>>
>>> Martin
>>
>> I agree with Martin.  I like the idea of using color (red?)
>> for the new Cook's contours.  People who want (fairly)
>> precise comparisons of the Cook's statistics can still use
>> the present plot #4, perhaps as a follow-up to the new plot #5.
>> It would be possible to label the Cookwise most extreme
>> points with the actual values (to perhaps 2sig figures, i.e.,
>> labeling on both sides of such points), but this would add
>> what I consider is unnecessary clutter to the graph.
>>
>> John.
>>
>> John Maindonald             email: john.maindonald at anu.edu.au
>> phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
>> Centre for Bioinformation Science, Room 1194, John Dedman
>> Mathematical Sciences Building (Building 27) Australian
>> National University, Canberra ACT 0200.
>>
>> ______________________________________________
>> R-devel at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
> <influence-plots.R>
John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Bioinformation Science, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.