[Rd] lm() gives different results to lm.ridge() and SPSS

Thu May 4 19:07:33 CEST 2017

Hi Nick,

I think that the problem here is your use of $coef to extract the coefficients of the ridge regression. The help for lm.ridge states that coef is a "matrix of coefficients, one row for each value of lambda. Note that these are not on the original scale and are for use by the coef method."

I ran a small test with simulated data, code is copied below, and indeed the output from lm.ridge differs depending on whether the coefficients are accessed via $coef or via the coefficients() function. The latter does produce results that match the output from lm.

I hope that helps.

Cheers,

Simon

## Load packages
library(MASS)

## Set seed 
set.seed(8888)

## Set parameters
n <- 100
beta <- c(1,0,1)
sigma <- .5
rho <- .75

## Simulate correlated covariates
Sigma <- matrix(c(1,rho,rho,1),ncol=2)
X <- mvrnorm(n,c(0,0),Sigma=Sigma)

## Simulate data
mu <- beta[1] + X %*% beta[-1]
y <- rnorm(n,mu,sigma)

## Fit model with lm()
fit1 <- lm(y ~ X)

## Fit model with lm.ridge()
fit2 <- lm.ridge(y ~ X)

## Compare coefficients
cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2))

                   [,1]        [,2]        [,3]
(Intercept)  0.99276001          NA  0.99276001
X1          -0.03980772 -0.04282391 -0.03980772
X2           1.11167179  1.06200476  1.11167179

--

Simon Bonner
Assistant Professor of Environmetrics/ Director MMASc 
Department of Statistical and Actuarial Sciences/Department of Biology 
University of Western Ontario

Office: Western Science Centre rm 276

Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813
Twitter: @bonnerstatslab | Website: http://simon.bonners.ca/bonner-lab/wpblog/

> -----Original Message-----
> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick
> Brown
> Sent: May 4, 2017 10:29 AM
> To: r-devel at r-project.org
> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS
> 
> Hallo,
> 
> I hope I am posting to the right place. I was advised to try this list by Ben Bolker
> (https://twitter.com/bolkerb/status/859909918446497795). I also posted this
> question to StackOverflow
> (http://stackoverflow.com/questions/43771269/lm-gives-different-results-
> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my first
> program in 1975 and have been paid to program in about 15 different
> languages, so I have some general background knowledge.
> 
> 
> I have a regression from which I extract the coefficients like this:
> lm(y ~ x1 * x2, data=ds)$coef
> That gives: x1=0.40, x2=0.37, x1*x2=0.09
> 
> 
> 
> When I do the same regression in SPSS, I get:
> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14.
> So the main effects are in agreement, but there is quite a difference in the
> coefficient for the interaction.
> 
> 
> X1 and X2 are correlated about .75 (yes, yes, I know - this model wasn't my
> idea, but it got published), so there is quite possibly something going on with
> collinearity. So I thought I'd try lm.ridge() to see if I can get an idea of where
> the problems are occurring.
> 
> 
> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge penalty) and
> check we get the same results as with lm():
> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef
> x1=0.40, x2=0.37, x1*x2=0.14
> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is the
> default, so it can be omitted; I can alternate between including or deleting
> ".ridge" in the function call, and watch the coefficient for the interaction
> change.)
> 
> 
> 
> What seems slightly strange to me here is that I assumed that lm.ridge() just
> piggybacks on lm() anyway, so in the specific case where lambda=0 and there
> is no "ridging" to do, I'd expect exactly the same results.
> 
> 
> Unfortunately there are 34,000 cases in the dataset, so a "minimal" reprex will
> not be easy to make, but I can share the data via Dropbox or something if that
> would help.
> 
> 
> 
> I appreciate that when there is strong collinearity then all bets are off in terms
> of what the betas mean, but I would really expect lm() and lm.ridge() to give
> the same results. (I would be happy to ignore SPSS, but for the moment it's
> part of the majority!)
> 
> 
> 
> Thanks for reading,
> Nick
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel