[R] Different Lambdas and Coefficients between cv.glmnet and intercept = FALSE

Tue Feb 23 08:21:44 CET 2021

Hello,

I'm currently reviewing how to correctly implement `glmnet` and am having a hard time understanding why the results seem to be different between each method when `intercept = TRUE/FALSE` as I thought it should just drop the intercept from the model. However, it seems to be acting a bit different and I'm not sure how.

For a given lambda, if both `X` and `y` are scaled, it appears we can identify the same results:
```
library(glmnet)
data(QuickStartExample)
lambda_grid <- 10 ^ seq(10, -2, length = 100)
With_Intercept<-glmnet(scale(x),c(scale(y)))
Without_Intercept<-glmnet(scale(x),c(scale(y)), intercept=FALSE)
# Extract coefficients at a single value of lambda
cbind(coef(With_Intercept,s=0.01), coef(Without_Intercept,s=0.01))[-1,]
```
While this is good, it's not clear to me how to put these back into their original scale. Further, this is for a given value of lambda. When using `cv.glmnet`, I'd like to identify the optimal lambda such that:
```
With_Intercept <- cv.glmnet(scale(x),c(scale(y)), lambda = lambda_grid)
Without_Intercept <- cv.glmnet(scale(x), c(scale(y)), lambda = lambda_grid, intercept=FALSE)
cbind(coef(With_Intercept, s=With_Intercept$lambda.min, exact = TRUE, x = scale(x), y = scale(y)),
      coef(Without_Intercept, s=Without_Intercept$lambda.min, exact = TRUE, x = scale(x), y = scale(y)))[-1,]
```
If I use `With_Intercept$lambda.min` to identify the `Without_Intercept` model, I get the same coefficients, but this doesn't necessarily give me confidence in what is the right model to use. Further, I'm not sure how to put the coefficients back into the right scale.

I've tried to compare all of the possible combinations between standardising, scaling, and leaving the variables as they are, but I'm still struggling with the best method and how to ensure I'm implementing `glmnet` correctly.

If anyone has advice on how to proceed and interpret these methods or get consistent results I would appreciate it. I've been reading the Introduction to Statistical Learning, Elements of Statistical Learning, Statistical Learning and Sparsity, as well as the `glmnet` vignette but am still a bit unclear.

Thanks,

Kevin