[R] Bootstrap Methods for Confidence Intervals -- glmnet

Bert Gunter bgunter.4567 at gmail.com
Thu May 12 17:02:13 CEST 2016


Lorenzo:

This is a complicated and subtle question that I believe is mostly
about statistical methodology, not R. I would suggest that you post
your query to stats.stackexchange.com rather than here in order to
determine *what* you should do. Then, if necessary, you can come back
here to ask about *how* to do it in R (with code from your failed
attempts, etc.).

Better yet, you might wish to have this discussion with a local
expert, if you can find one.

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Thu, May 12, 2016 at 7:49 AM, Lorenzo Isella
<lorenzo.isella at gmail.com> wrote:
> Dear All,
> Please have a look at the code at the end of the email.
> It is just an example of regression based on glmnet with some
> artificial data.
> My question is how I can evaluate the uncertainty of the prediction
> yhat.
>
> It looks like there are some reasons for not providing a standard
> error estimate, see e.g.
>
>
>
> http://stackoverflow.com/questions/12937331/how-to-get-statistical-summary-information-from-glmnet-model
> and
> https://www.reddit.com/r/statistics/comments/1vg8k0/standard_errors_in_glmnet/
>
> However, from what I read in this thesis
>
> https://air.unimi.it/retrieve/handle/2434/153099/133417/phd_unimi_R07738.pdf
>
> (see sections 3.2 and 3.3)
>
> and in the quoted papers
>
> http://www.stat.cmu.edu/~fienberg/Statistics36-756/Efron1979.pdf
> and
> http://www.ams.org/journals/proc/2010-138-12/S0002-9939-2010-10474-4/S0002-9939-2010-10474-4.pdf
>
> there are some bootstrap methods that are quite general and applicable
> well beyond the case of glmnet.
> Is there anything already implemented to help me out? Is anybody aware
> of this?
> Cheers
>
> Lorenzo
>
> #########################################################################
> #########################################################################
> #########################################################################
> #########################################################################
> #########################################################################
>
>
> library(glmnet)
>
>
> # Generate data
> set.seed(19875)  # Set seed for reproducibility
> n <- 1000  # Number of observations
> p <- 5000  # Number of predictors included in model
> real_p <- 15  # Number of true predictors
> x <- matrix(rnorm(n*p), nrow=n, ncol=p)
> y <- apply(x[,1:real_p], 1, sum) + rnorm(n)
>
> # Split data into train (2/3) and test (1/3) sets
> train_rows <- sample(1:n, .66*n)
> x.train <- x[train_rows, ]
> x.test <- x[-train_rows, ]
>
> y.train <- y[train_rows]
> y.test <- y[-train_rows]
>
>
>
> fit.elnet <- glmnet(x.train, y.train, family="gaussian", alpha=.5)
>
> yhat <- predict(fit.elnet, s=fit.elnet$lambda, newx=x.test)
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list