[Rd] plot(<lm>): new behavior in R-2.2.0 alpha

Werner Stahel stahel at stat.math.ethz.ch
Fri Sep 16 09:37:02 CEST 2005


Dear Martin, dear Johns

Thanks for including me into your discussion. 

I am a strong supporter of "Residuals vs. Hii"

>> One remaining problem I'd like to address is the "balanced AOV"
>> situation, ...

In order to keep the plots consistent, I suggest to draw a
histogram. Other alternatives will or can be interesting in the 
general case and therefore are not a suitable substitute for
this plot. 

A plot to be developed may be the following:
Define a distance in the subspace of x-space that is in some way
orthogonal (eg, with respect to the covariance matrix of the x's)
to the fit. Then plot residuals vs. this distance, with
different symbols for small, medium and large fit.
... but this is still a project.

A related project: Daniel (and Wood) introduced a term WSSD, a
distance in x-space. He then studied, for pairs of points, 
difference in residuals as a function of WSSD. If the function
increases, this indicates a lack of fit.

Back to currently available methods:

John Maindonald discusses different contours. I like the
implementation I get currently in R-devel: contours of Cook's
distances, since they are popular and we can then argue that the
plot of D_i vs. i is no more needed.

For most plots, I like to see a smoother along with the points.
I suggest to add the option to include smoothers, not only as an
argument to plot.lm, but even as an option().
I have heared of the intense discussions about options().
With Martin, we arrived at the conclusion that options() should
never influence calculations and results, but is suitable to
adjust outputs (numerical: digits=, graphical: smooth=) to the
user's taste.

>> (4) Are there other diagnostics that ought to be included in
>> stats? (perhaps in a function other than plot.lm(), which risks
>> being overloaded).  One strong claiment is vif() (variance
>> inflation factor),

I clearly support to add either vif or -- equivalent and more
intuitive to me -- R^2_j, the coefficient of determination of 
lm(X_j~.) However
-- this should be included in the coefficient table of print.lm
-- this adds another useless and misleading quantity for dummy
x-variables 
It is therefore quite a different question.

I have my own version of print for my own version of a function
regr(...) that calls lm, glm and other regression functions. 
If you are interested, I can send these functions within a few weeks.

>> (5) termplot() provides partial residual (component + residual)
>> plots, which I think extraordinarily useful.  They deserve to be
>> widely used.
>> Should partial regression plots also be available?

The plot method for my regr objects includes termplots.
I prefer residuals without component effects, but add a
reference line that allows for assessing the component effects.

>> (6) It should be fairly easy to construct a function that would
>> examine the distribution of statistics of interest under repeated
>> bootstrap sampling or simulation.  This can be useful when
>> with small samples, when it is easy to over-interpret diagnostic
>> statistics.

As we focus on plots, my plot method includes the option
(default) to add smooths for 20 simulated datasets (according to
the fitted model). 

>> (8) Are there special issues that require attention for large
>> datasets? [I'm sure there are, but regression diagnostics may
>> not be the best point of entry into the discussion.]

A cynical remark that I like to make about the state of
statistics: 
There is no program that is able to produce a scatterplot of two
variables adequately. 
The functions that I have seen work only for textbook
situations.
Large sample is one situation where they fail, others being
-- multiple points (due to rounding or classification)
-- outliers

This seems to be enough for one message ...

Cheers,

Werner 
----------------- This message was sent by ---------------------------
Werner Stahel                              http://stat.ethz.ch/~stahel
Seminar fuer Statistik                     phone  :    +41 1 632 34 30
ETH-Zentrum, LEO D8                        fax    :    +41 1 632 12 28
CH-8092 Zurich, Switzerland                meet me: Leonhardstr.27, D8



More information about the R-devel mailing list