# [R] Cross-validation for logistic regression with lasso2

francogrex francogrex at mail.com
Fri May 18 13:44:35 CEST 2007

```Hello, I am trying to shrink the coefficients of a logistic regression for a
sparse dataset, I am using the lasso (lasso2) and I am trying to determine
the shrinkinage factor by cross-validation. I would like please some of the
experts here to tell me whether i'm doing it correctly or not. Below is my
dataset and the functions I use

w=
a	b	c	d	e	P	A
0	0	0	0	0	1	879
1	0	0	0	0	1	3
0	1	0	0	0	7	7
0	0	1	0	0	230	2
0	0	0	1	0	450	7
0	0	0	0	1	4

#The GLM output shows that the coefficients c and d are larger than 10:
resp=cbind(w\$P,w\$A)
summary(glm(resp~a+b+c+d+e,data=w,family=binomial))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)   -6.779      1.001  -6.775 1.24e-11 ***
a              5.680      1.528   3.718 0.000201 ***
b              6.779      1.134   5.976 2.29e-09 ***
c             11.524      1.227   9.392  < 2e-16 ***
d             10.942      1.071  10.220  < 2e-16 ***
e              3.688      1.124   3.282 0.001031 **

#so I wrote this below using the lasso2 package to determine the best
shrinkage factor using the gcv cross-validation:

for (i in seq(1,40,1)) {
glmba=gl1ce(resp~a+b+c+d+e, data = w, family = binomial(),bound=i)
ecco=round(gcv(glmba,type="Tibshirani",gen.inverse.diag =1e11),digits=3)
print(ecco)
}
#and it gives me 21 with the lowest gcv.

#then I determine the shrunken coefficients:
>gl1ce( resp ~ a + b + c + d + e, data = w, family = binomial(),  bound =
21)
Coefficients:
(Intercept)           a               b               c                 d
e
-4.749816    2.776215    4.342661    8.956583    8.661593    1.264660
Family:
Family: binomial