[R] different results with plot.lm vs. plot.lm(which=c(2))

Thu Nov 13 14:43:56 CET 2008

This was AFAICS a bug introduced in 2.7.1 by

     o	plot(<glm>, which=5) uses more correct Cook's distance contours;
 	(fix to fix to PR#9316).

Thanks to Greg for the diagnosis.
Will be fixed in R-patched later today.

On Wed, 12 Nov 2008, Greg Snow wrote:

> Just a clarification on one of my statements below.  I realize on rereading my statement on R-core paying attention that it could be interpreted as a possible criticism.  That is not how it was intended, rather that I have seen cases in the past where a discussion confirms that something is a bug and a member of R-core already has the fix started or in place before a formal bug report could be sent and so we should not send a bug report on this unless it was clear that one was needed (which it is not, I received an e-mail off line that Prof. Ripley has started looking into it).
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> Statistical Data Center
> Intermountain Healthcare
> greg.snow at imail.org
> 801.408.8111
>
>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>> project.org] On Behalf Of Greg Snow
>> Sent: Wednesday, November 12, 2008 1:07 PM
>> To: Effie Greathouse; r-help at r-project.org
>> Subject: Re: [R] different results with plot.lm vs. plot.lm(which=c(2))
>>
>> The same thing is affecting plot 2 as well.  Basically in the code
>> there is line early on:
>>
>> r <- residuals(x)
>>
>> which gets the residuals and by default for glm models those are the
>> deviance residuals.
>>
>> A bit latter is the code (after some optional modifications to r):
>>
>>     if (any(show[2:3])) {
>>         ylab23 <- if (isGlm)
>>             "Std. deviance resid."
>>         else "Standardized residuals"
>>         r.w <- if (is.null(w))
>>             r
>>         else sqrt(w) * r
>>     }
>>
>> Which results in r.w being the residuals (or resids times the square
>> root of w) if either plot 2 or 3 is requested.
>>
>> The next batch of code is:
>>
>>     if (show[5]) {
>>         ylab5 <- if (isGlm)
>>             "Std. Pearson resid."
>>         else "Standardized residuals"
>>         r.w <- residuals(x, "pearson")
>>         if (!is.null(w))
>>             r.w <- r.w[wind]
>>     }
>>
>> Which changes r.w to the pearson residuls if plot 5 is requested, it
>> also sets the label to use for plot 5, but does not change the label
>> (ylab23) to be used for plots 2 and 3 and so they still are labeled as
>> deviance residuals.
>>
>> The r.w variable is used latter in the code for plots 2, 3, and 5 (and
>> maybe others).  Unless there is something else in between the above
>> code and where r.w is used that fixes this (I did not see anything, but
>> did not do more than skim the code) then there is a clear bug in that
>> the residuals being used is inconsistent and mis-labeled in some cases.
>>
>> If anyone on R-core is paying attention to this thread, then the
>> problem may already be in the process of being fixed (or it may already
>> be fixed, check r-patched, the above is based on R 2.8.0 (2008-10-20)),
>> I've had a couple of cases where the fix was in place before I could
>> submit a bug report.  If we don't see any indication of it being fixed
>> in the next couple of days, then one of us should submit a bug report.
>>
>>
>> --
>> Gregory (Greg) L. Snow Ph.D.
>> Statistical Data Center
>> Intermountain Healthcare
>> greg.snow at imail.org
>> 801.408.8111
>>
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>> project.org] On Behalf Of Effie Greathouse
>>> Sent: Wednesday, November 12, 2008 12:15 PM
>>> To: r-help at r-project.org
>>> Subject: Re: [R] different results with plot.lm vs.
>> plot.lm(which=c(2))
>>>
>>> There's also a big difference in plot 2 (Normal Q-Q) in my real data,
>>> but I
>>> don't see a real difference in plot 2 for the example I sent, except
>>> that
>>> some of the outlier labels are different.  Would the residuals being
>>> plotted
>>> likely be the cause of the difference in plot 2 as well?  Dr. Snow,
>>> would
>>> you want to be able to see the plot 2 graphs for my real data?  I
>> don't
>>> know
>>> how to save plot 2 when I'm clicking through the plot(model) graphs,
>> so
>>> I
>>> can't just send it to you.  Thanks for checking this out Dr. Snow!
>>>
>>> On Wed, Nov 12, 2008 at 11:05 AM, Greg Snow <Greg.Snow at imail.org>
>>> wrote:
>>>
>>>> From a quick look at the code it looks like when you ask for plot
>>> number 5
>>>> (included in default when 'which' is not specified), then the
>>> deviance
>>>> residuals are replaced by the pearson residuals to be used in later
>>>> computations.  So the difference that you are seeing is that one of
>>> the
>>>> plots is based on deviance residuals and the other on pearson
>>> residuls.
>>>>
>>>> It seems that there is a bug here in that, at a minimum, the label
>>> should
>>>> be changed to indicate which residuals were actually used, or the
>>> code
>>>> changed to continue to use the deviance residuals for plot 3 even
>>> when plot
>>>> 5 is requested.
>>>>
>>>> Does anyone else see something that I missed in how the residuals
>> are
>>>> replaced and used?
>>>>
>>>> --
>>>> Gregory (Greg) L. Snow Ph.D.
>>>> Statistical Data Center
>>>> Intermountain Healthcare
>>>> greg.snow at imail.org
>>>> 801.408.8111
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
>>>>> project.org] On Behalf Of Effie Greathouse
>>>>> Sent: Wednesday, November 12, 2008 11:16 AM
>>>>> To: r-help at r-project.org
>>>>> Subject: Re: [R] different results with plot.lm vs.
>>> plot.lm(which=c(2))
>>>>>
>>>>> Hi Dr. Ripley--Sorry for the repost everybody.  The original
>>> message I
>>>>> sent
>>>>> never showed up in my inbox, so I thought it didn't get sent to
>> the
>>>>> list.
>>>>>
>>>>> I'm running R 2.8.0, installed from a pre-compiled version, on
>>> Windows
>>>>> XP.
>>>>> When I type Sys.getlocale() at the R prompt, it returns:
>>>>>  "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
>>>>> States.1252;LC_MONETARY=English_United
>>>>> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
>>>>>
>>>>> Here's an example:
>>>>> bob <- seq(1:100)
>>>>> bob2 <- rgamma(100, 2, 1)*10+bob
>>>>> model<-glm(bob2 ~ bob, family=Gamma)
>>>>>
>>>>> Then enter:
>>>>> plot(model, which=c(3))
>>>>> to get the Scale-Location graph
>>>>>
>>>>> Then compare it to the Scale-Location graph when you run the
>>> following
>>>>> command and page through to the 3rd graph:
>>>>> plot(model)
>>>>>
>>>>> When I do this, I get different results -- some of the high
>> values
>>> are
>>>>> different on each plot.  On my real data the difference is more
>>> severe
>>>>> than
>>>>> in this randomly generated example.  I'd be happy to supply my
>> real
>>>>> data and
>>>>> R code if this smaller example isn't sufficient.  Thank you for
>> any
>>>>> help!!
>>>>>
>>>>>
>>>>> On Wed, Nov 12, 2008 at 9:43 AM, Prof Brian Ripley
>>>>> <ripley at stats.ox.ac.uk>wrote:
>>>>>
>>>>>> Instead of re-posting the same message, please study the
>> posting
>>>>> guide and
>>>>>> supply the information asked for, including a reproducible
>>> example.
>>>>> There is
>>>>>> no way we can help you unless you help us to help you.
>>>>>>
>>>>>>
>>>>>> On Wed, 12 Nov 2008, Effie Greathouse wrote:
>>>>>>
>>>>>>   I am running GLM models using the gamma family.  For example:
>>>>>>> model <-glm(y ~ x, family=Gamma(link="identity"))
>>>>>>>
>>>>>>> I am getting different results for the normal Q-Q plot and the
>>>>>>> Scale-Location plot if I run the diagnostic plots without
>>> specifying
>>>>> the
>>>>>>> plot vs. if I specify the plot ... e.g., "plot(model)" gives
>> me
>>> a
>>>>>>> different
>>>>>>> Normal Q-Q graph than "plot(model, which=c(2))".  The former
>>> gives
>>>>> data
>>>>>>> points distributed in a quadratic pattern, while the latter
>>> gives
>>>>> data
>>>>>>> points more or less along the 1:1 line.  Shouldn't these two
>>>>> commands be
>>>>>>> giving me the same exact graphs?  I have read the
>> documentation
>>> on
>>>>> plot.lm
>>>>>>> and searched the help archives, but I am still learning GLM's
>>> and
>>>>> I'm not
>>>>>>> very familiar with understanding diagnostic plots for GLM's,
>> so
>>> any
>>>>> help
>>>>>>> would be much appreciated!
>>>>>>>
>>>>>>>        [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help at r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide
>>>>>>> http://www.R-project.org/posting-guide.html<http://www.r-
>>> project.org/posting-guide.html>
>>>> <http://www.r-
>>>> > project.org/posting-guide.html>
>>>>>>> and provide commented, minimal, self-contained, reproducible
>>> code.
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Brian D. Ripley,                  ripley at stats.ox.ac.uk
>>>>>> Professor of Applied Statistics,
>>> http://www.stats.ox.ac.uk/~ripley/
>>>>>> University of Oxford,             Tel:  +44 1865 272861 (self)
>>>>>> 1 South Parks Road,                     +44 1865 272866 (PA)
>>>>>> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>>>>>>
>>>>>
>>>>>         [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-
>> project.org/posting-
>>> <http://www.r-project.org/posting->
>>>>> guide.html
>>>>> and provide commented, minimal, self-contained, reproducible
>> code.
>>>>
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-
>>> guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-
>> guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595