[R] (no subject)

Mon Jul 30 22:06:12 CEST 2018

How can one possibly answer this without knowing the structure of your
dataset?

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Mon, Jul 30, 2018 at 8:24 AM, Baojun Sun <bsun1 using students.towson.edu>
wrote:

> The book "Introduction to Statistical Learning" gives R scripts for its
> labs. I found a script for ridge regression that works on the dataset the
> book uses but is unusable on other datasets I own unless I clean the data.
>
>
> I'm trying to understand the syntax for I need for data cleaning and am
> stuck. I want to learn to do ridge regression. I tried using my own data
> set on this script rather than the book example but get errors. If you use
> your own data set rather than the Hitters dataset, then you'll get errors
> unless you format your code. How do I change this script or clean any
> dataset so that this script for ridge regression useable for all datasets?
>
>
>     library(ISLR)
>
>     fix(Hitters)
>
>     names(Hitters)
>
>     dim(Hitters)
>
>     sum(is.na(Hitters$Salary))
>
>     Hitters=na.omit(Hitters)
>
>     dim(Hitters)
>
>     sum(is.na(Hitters))
>
>     library(leaps)
>
>
>
>     x=model.matrix(Salary~.,Hitters)[,-1]
>
>     y=Hitters$Salary
>
>
>
>     # Ridge Regression
>
>
>
>     library(glmnet)
>
>     grid=10^seq(10,-2,length=100)
>
>     ridge.mod=glmnet(x,y,alpha=0,lambda=grid)
>
>     dim(coef(ridge.mod))
>
>     ridge.mod$lambda[50]
>
>     coef(ridge.mod)[,50]
>
>     sqrt(sum(coef(ridge.mod)[-1,50]^2))
>
>     ridge.mod$lambda[60]
>
>     coef(ridge.mod)[,60]
>
>     sqrt(sum(coef(ridge.mod)[-1,60]^2))
>
>     predict(ridge.mod,s=50,type="coefficients")[1:20,]
>
>     set.seed(1)
>
>     train=sample(1:nrow(x), nrow(x)/2)
>
>     test=(-train)
>
>     y.test=y[test]
>
>     ridge.mod=glmnet(x[train,],y[train],alpha=0,lambda=grid, thresh=1e-12)
>
>     ridge.pred=predict(ridge.mod,s=4,newx=x[test,])
>
>     mean((ridge.pred-y.test)^2)
>
>     mean((mean(y[train])-y.test)^2)
>
>     ridge.pred=predict(ridge.mod,s=1e10,newx=x[test,])
>
>     mean((ridge.pred-y.test)^2)
>
>     ridge.pred=predict(ridge.mod,s=0,newx=x[test,],exact=T)
>
>     mean((ridge.pred-y.test)^2)
>
>     lm(y~x, subset=train)
>
>     predict(ridge.mod,s=0,exact=T,type="coefficients")[1:20,]
>
>     set.seed(1)
>
>     cv.out=cv.glmnet(x[train,],y[train],alpha=0)
>
>     plot(cv.out)
>
>     bestlam=cv.out$lambda.min
>
>     bestlam
>
>     ridge.pred=predict(ridge.mod,s=bestlam,newx=x[test,])
>
>     mean((ridge.pred-y.test)^2)
>
>     out=glmnet(x,y,alpha=0)
>
>     predict(out,type="coefficients",s=bestlam)[1:20
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]