[R] glmnet() vs. lars()

Wed Mar 21 15:26:03 CET 2012

On 03/21/2012 06:30 AM, Vito Muggeo (UniPa) wrote:
> It appears that glmnet(), when "selecting" the covariates entering the
> model, skips from K covariates, say, to K+2 or K+3. Thus 2 or 3
> variables are "added" at the same time and it is not possible to obtain
> a ranking of the covariates according to their importance in the model.
> On the other hand lars() "adds" the covariates one at a time.
> My question is: is it possible to obtain a similar output of lars (in
> terms of order of the variables entering the model) using glmnet()?

glmnet() is based on an iterative coordinate descent algorithm applied 
to a grid of lambda values; LARS is a more elegant algorithm and 
computes exact solutions.  You can get your glmnet solutions to have 
higher resolution (more "exact") by using a finer grid.  In your example:

> set.seed(123)
> x=matrix(rnorm(100*20),100,20)
> y=rnorm(100)
> fit1=glmnet(x,y)
> fit1$df
  [1]  0  2  4  4 ...

The default is a grid of 100 lambda values.  If we use 300 values, the 
resolution is finer and we can see the variables enter one at a time:

 > fit1=glmnet(x,y,nlambda=300)
 > fit1$df
   [1]  0  1  1  2  3  3  4  ...

However, it is impossible to know in advance how fine the grid must be 
in order to ensure that only one variable enters the model between any 
two consecutive lambda values.

-- 
Patrick Breheny
Assistant Professor
Department of Biostatistics
Department of Statistics
University of Kentucky