[R] logistic regression model + Cross-Validation

Weiwei Shi helprhelp at gmail.com
Tue Jan 23 02:30:31 CET 2007


why not use lda{MASS} and it has cv=T option; it does "loo", though.
Or use randomForest.

if you have to use lrm, then the following code might help:

n.fold <- 5 # 5-fold cv
n.sample <- 50 # assumed 50 samples
s <- sample(1:n.fold, size=n.sample, replace=T)
for (i in 1:n.fold){
  # create your training data and validation data for each fold
  trn <- YOURWHOLEDATAFRAME[s!=i,]
  val <- YOURWHOLEDATAFRAME[s==i,]
  # now do your own modeling using lrm
  # todo
}

HTH,

weiwei

On 1/21/07, nitin jindal <nitin.jindal at gmail.com> wrote:
> If validate.lrm does not has this option, do any other function has it.
> I will certainly look into your advice on cross validation. Thnx.
>
> nitin
>
> On 1/21/07, Frank E Harrell Jr <f.harrell at vanderbilt.edu> wrote:
> >
> > nitin jindal wrote:
> > > Hi,
> > >
> > > I am trying to cross-validate a logistic regression model.
> > > I am using logistic regression model (lrm) of package Design.
> > >
> > > f <- lrm( cy ~ x1 + x2, x=TRUE, y=TRUE)
> > > val <- validate.lrm(f, method="cross", B=5)
> >
> > val <- validate(f, ...)    # .lrm not needed
> >
> > >
> > > My class cy has values 0 and 1.
> > >
> > > "val" variable will give me indicators like slope and AUC. But, I also
> > need
> > > the vector of predicted values of class variable "cy" for each record
> > while
> > > cross-validation, so that I can manually look at the results. So, is
> > there
> > > any way to get those probabilities assigned to each class.
> > >
> > > regards,
> > > Nitin
> >
> > No, validate.lrm does not have that option.  Manually looking at the
> > results will not be easy when you do enough cross-validations.  A single
> > 5-fold cross-validation does not provide accurate estimates.  Either use
> > the bootstrap or repeat k-fold cross-validation between 20 and 50 times.
> >   k is often 10 but the optimum value may not be 10.  Code for averaging
> > repeated cross-validations is in
> > http://biostat.mc.vanderbilt.edu/twiki/pub/Main/RmS/logistic.val.pdf
> > along with simulations of bootstrap vs. a few cross-validation methods
> > for binary logistic models.
> >
> > Frank
> > --
> > Frank E Harrell Jr   Professor and Chair           School of Medicine
> >                       Department of Biostatistics   Vanderbilt University
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III



More information about the R-help mailing list