[R] validation logistic regression

Frank E Harrell Jr f.harrell at Vanderbilt.Edu
Wed May 26 15:24:55 CEST 2010


Better would be 100 repeats of 10-fold cross-validation, or 
bootstrapping, as implemented in the rms package.

Frank

On 05/26/2010 08:21 AM, azam jaafari wrote:
>
> Hi
>
> Thank you for your reply.
>
> I'm new in R. So I'm slow
>
> If I want to do leave-one-out cross validation with these data(100), how I tell R that omit one by one data? Is validationsize=100?
>
>   Thanks alot
>
> Azam
>
> --- On Wed, 5/26/10, Joris Meys<jorismeys at gmail.com>  wrote:
>
>
> From: Joris Meys<jorismeys at gmail.com>
> Subject: Re: [R] validation logistic regression
> To: "azam jaafari"<azamjaafari at yahoo.com>
> Cc: r-help at r-project.org
> Date: Wednesday, May 26, 2010, 5:00 AM
>
>
> Hi,
>
> first of all, you shouldn't backtransform your prediction, use the option type=response instead :
>
> salichpred<-predict(salic.lr, newdata=profilevalidation,type="response")
>
> limit<- 0.5
> salichpredcat<- ifelse(salichpred<limit,0,1) # prediction of categories.
>
> Read in on sensitivity, specificity and ROC-curves. With changing the limit, you can calculate sensitivity and specificity, and you can construct a ROC curve that will tell you how well your predictions are. It all depends on how much error you allow on the predictions.
>
> Cheers
> Joris
>
>
>
> On Wed, May 26, 2010 at 10:04 AM, azam jaafari<azamjaafari at yahoo.com>  wrote:
>
> Hi
>
> I did validation for prediction by logistic regression according to following:
>
> validationsize<- 23
> set.seed(1)
> random<-runif(123)
> order(random)
> nrprofilesinsample<-sort(order(random)[1:100])
> profilesample<- data[nrprofilesinsample,]
> profilevalidation<- data[-nrprofilesinsample,]
> salich<-profilesample$SALIC.H.1
> salic.lr<-glm(salich~wetnessindex, profilesample, family=binomial('logit'))
> summary(salic.lr)
> salichpred<-predict(salic.lr, newdata=profilevalidation)
> expsalichpred<-exp(salichpred)
> salichprediction<-(expsalichpred/(1+expsalichpred))
>
> So,
>   table(salichprediction, profilevalidation$SALIC.H.1)
>
> in result:
> salichprediction            0 1
>    0.0408806327422231 1 0
>    0.094509645033899  1 0
>    0.118665480273383  1 0
>    0.129685441514168  1 0
>    0.13545295569511    1 0
>    0.137580612201769  1 0
>    0.197265822234215  1 0
>    0.199278585548248  0 1
>    0.202436276322278  1 0
>    0.211278767985746  1 0
>    0.261036846823867  1 0
>    0.283792703256058  1 0
>    0.362229486187581  0 1
>    0.362795636267779  1 0
>    0.409067386115694  1 0
>    0.410860613509484  0 1
>    0.423960962956254  1 0
>    0.428164288793652  1 0
>    0.448509687866763  0 1
>    0.538401659478058  0 1
>    0.557282539294224  1 0
>    0.603881788227797  0 1
>    0.63633478460736   0 1
>
> So, I have salichprediction between 0 to 1 and binary variable(observed values) 0 or 1. I want to compare these data together and I want to know is ok this model(logistic regression) for prediction or no?
>
> please help me?
>
> Thanks alot
>
> Azam
>
>
>
>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 
Frank E Harrell Jr   Professor and Chairman        School of Medicine
                      Department of Biostatistics   Vanderbilt University



More information about the R-help mailing list