[R] R.squared in summary.lm with weights

Murray Efford murray.efford at otago.ac.nz
Mon Apr 11 00:39:47 CEST 2016


Among the 6547 matches for 'PRESS' in an sos search I find 7 packages (asbio, DAAG, qpcR, CombMSC, rknn, MPV, mixlm) with a relevant 'press' or 'PRESS' function. Of these only qpcR (PRESS), mixlm (R2pred), and rknn (rqsp) attempt to calculate PRESS R^2, as far as I can tell. None of these confronts the possibility of weights explicitly.

By way of example:

## generate a simple dataset and fit weighted regression
x <- 1:20
df <- data.frame(x = x, y = 2*x + 10* rnorm(20), wt = runif(20) * 10)
fitwt <- lm(y~x, data = df, weights = wt)
fitnowt <- lm(y~x, data = df)

## apply PRESS R^2 methods from 3 packages
otherpressR2 <- function (fit) {
    require(mixlm)
    require(qpcR)
    require(rknn)
    c(qpcR = qpcR::PRESS(fit, verbose = FALSE)$P.square,
      mixlm = mixlm::R2pred(fit),
      rknn = rknn::rsqp(fit))
}
otherpressR2(fitwt)
# qpcR   mixlm    rknn 
# 0.42865 0.56124 0.56124 
# There were 21 warnings (use warnings() to see them)

otherpressR2(fitnowt)
# qpcR   mixlm    rknn 
# 0.59391 0.59391 0.59391 
# There were 21 warnings (use warnings() to see them)

(The warnings from qpcR are not material).

Two different versions of PRESS-R^2 are implemented, and I see no reason to trust either version in the case of weighted regression. The key issue is the appropriate calculation of MSS or TSS as indicated before.

Murray Efford
________________________________________
From: David Winsemius <dwinsemius at comcast.net>
Sent: Monday, 11 April 2016 5:31 a.m.
To: Murray Efford
Cc: r-help at r-project.org; Martin Maechler; peter dalgaard
Subject: Re: [R] R.squared in summary.lm with weights

> On Apr 10, 2016, at 9:38 AM, David Winsemius <dwinsemius at comcast.net> wrote:
>
>>
>> On Apr 10, 2016, at 3:11 AM, Murray Efford <murray.efford at otago.ac.nz> wrote:
>>
>> Martin -
>> Thanks, but although hatvalues() is useful for calculating PRESS, I can't find anything directly relevant to my question in the influence help pages. After some burrowing in the literature I'm doubting there is an answer out there (PRESS R^2 is always presented in a fairly ad hoc way).
>> This is a new topic, as you say, and perhaps better handled on a statistics list.
>> Murray Efford
>>
>> [BTW
>> stats ::: influence.lm
>> just gets me
>> function (model, do.coef = TRUE, ...)
>> lm.influence(model, do.coef = do.coef, ...)
>> <bytecode: 0x00000000081023b8>
>> <environment: namespace:stats>
>> which is not very helpful]
>
> influence.lm is just saying you should be looking at lm.influence
>
> #Try typing:
> lm.influence
>
> Admittedly the meat of that function is probably encapsulated in C with the results delivered by:
>
>      res <- .Call(C_influence, mqr, do.coef, e, tol)
>
> Perhaps looking at:
>
> https://svn.r-project.org/R/trunk/src/library/stats/src/influence.c
>
>
> I haven't been following the rest of the thread so this is just commenting on your difficulties reading R code.

When I do attempt filling in blanks in my knowledge regarding PRESS, I am reminded by MarkMail that this question came up 5-6 years ago and I went looking for an answer:

http://markmail.org/search/?q=list%3Aorg.r-project.r-help+PRESS#query:list%3Aorg.r-project.r-help%20PRESS+page:1+mid:k2mbz5sov5eo5ejw+state:results

I also see that other packages have implemented PRESS at least as reported by others:

Subject: [R] I need help computing PRESS statistics (qpcR package) of...:
From:   Francisco Goes (xico... at hotmail.com)
Date:   Jun 4, 2014 4:05:03 pm

I tried a current search, although I admit that the fact that "press" is an acronym shared by other topics does seem to complicate that process. I counted 9 packages with PRESS functions even after excluding the ones related to "the Press" and "protein residues" when I did a search with:

sos::findFn("PRESS")

--
David.



>
> --
>
> David.
>
>
>>
>> ________________________________________
>> From: Martin Maechler <maechler at stat.math.ethz.ch>
>> Sent: Sunday, 10 April 2016 4:07 a.m.
>> To: Murray Efford
>> Cc: peter dalgaard; Duncan Murdoch; r-help at r-project.org
>> Subject: Re: [R] R.squared in summary.lm with weights
>>
>>>>>>> Murray Efford <murray.efford at otago.ac.nz>
>>>>>>>   on Fri, 8 Apr 2016 18:45:33 +0000 writes:
>>
>>> Thanks for these perfectly consistent replies - I didn't
>>> understand the purpose of m = sum(w * f/sum(w)) and saw it
>>> merely as a weighted average of the fitted values.  My
>>> ultimate concern is how to compute an appropriate weighted
>>> TSS (or equivalently, MSS) for PRESS-R^2 = 1 - PRESS/TSS =
>>> 1 - PRESS/ (MSS + PRESS). Do you think it then makes sense
>>> to substitute the vector of leave-one-out fitted values
>>> for f here?
>>
>> --> A new topic really.
>>
>> I think you should find the answer on the help pages (and in the
>> source) of
>>
>>    ? influence.measures  (which documents a host of such functions)
>>   and
>>    ? influence
>>
>> Note that influence is S3 generic and
>>
>>  methods(influence)
>>
>> indicates that the 'lm' and 'glm' methods are hidden.
>> Of course I do recommend reading the real R source code (which
>>  also contains the comments and has some logical order in all the
>>  function definitions),
>> but you can use   stats ::: influence.lm
>> to show a version of the function that looks not too different
>> from the source.
>>
>> Martin Maechler, ETH Zurich
>>
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA




More information about the R-help mailing list