[R] seeking help on using LARS package

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed May 25 02:11:46 CEST 2011


Hi,

On Tue, May 24, 2011 at 12:50 PM, Vishal Thapar <vishalthapar at gmail.com> wrote:
> Hi,
>
> I am writing to seek some guidance regarding using Lasso regression with the
> R package LARS. I have introductory statistics background but I am trying to
> learn more. Right now I am trying to duplicate the results in a paper for
> shRNA prediction "An accurate and interpretable model for siRNA efficacy
> prediction, Jean-Philippe Vert et. al, Bioinformatics" for a Bioinformatics
> project that we are working on. I know that the authors of the paper are
> using Lasso regression and so far looking at their paper this is what I have
> gotten to.

I'm not going to comment on your code -- I'll just give you a birds eye view.

First off, use glmnet. You get the lasso by setting alpha to 1. You
might get better results, though, if you use it as an "elastic net"
and set alpha to something like .95 -- you'll have to play with it.
This gives a mix of the L1 + L2 penalty. What you get is a sparse
model (from the L1 penalty), but it now also has the tendency to give
similar coefficients to correlated features instead of just dropping
one of them and putting full weight on the other. That's a good thing.

Now that you are using the glmnet package, use `cv.glmnet`

Say you did:

R> cvg <- cv.glmnet(... whatever ..., alpha=0.95)

The idea is that you want use the value of lambda that performs best
under the cross validation scenario. Look at cvg$cvm

Now -- people like "sparser models," which you get by cranking up the
lambda value, but you don't want lambda to be so high that it makes
your model perform worse in the CV scenario.

cvg$lambda.min has the value of lambda that gives the minimum  cv error.

But, maybe you want the model sparser than that ... maybe it's OK if
it does 1 standard error away from the best performance seen across
your cv, that's why you have cvg$lambda.1se

So:

cvg$glmnet.fit will have the model fit on all the data.

You have to extract the coefficients from that model given a value of
lambda you deemed appropriate, that's why you do cross validation.

This might be the value of lambda in cvg$lambda.min or cvg$lambda.1se
-- you make the call.

Makes sense?

-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact



More information about the R-help mailing list