[R] Proper / Improper scoring Rules

Donald Catanzaro, PhD don.catanzaro.ccm at gmail.com
Fri Aug 7 18:02:50 CEST 2009


Hi All,

I am working on some ordinal logistic regresssions using LRM in the 
Design package.  My response variable has three categories (1,2,3) and 
after using the creating my model and using a call to predict some 
values and I wanted to use a simple .5 cut-off to classify my 
probabilities into the categories.

I had two questions:

a)  first, I am having trouble directly accessing the probabilities 
which may have more to do with my lack of experience with R

For instance, my calls

 >ologit.three.NoPerFor <- lrm(Threshold.Three ~ TECI , data=CLD, 
na.action=na.pass)
 >CLD$Threshold.Predict.Three.NoPerFor<- predict(ologit.three.NoPerFor, 
newdata=CLD, type="fitted.ind") 
 >CLD$Threshold.Predict.Three.NoPerFor.Cats[CLD$Threshold.Predict.Three.NoPerFor.Threshold.Three=1 
 > .5] <- 1
Error: unexpected '=' in 
"CLD$Threshold.Predict.Three.NoPerFor.Cats[CLD$Threshold.Predict.Three.NoPerFor.Threshold.Three="
 >
 >

produce an error message and it seems as R does not like the equal sign 
at all.  So how does one access the probabilities so I can classify them 
into the categories of 1,2,3 so I can look at performance of my model ?

b)  which leads me to my next question.  I thought that simply 
calculating the percent correct off of my predictions would be 
sufficient to look at performance but since my question is very much in 
line with this thread 
http://tolstoy.newcastle.edu.au/R/e4/help/08/04/8987.html I am not so 
sure anymore.  I am afraid I did not understand Frank Harrell's last 
suggestion regarding improper scoring rule - can someone point me to 
some internet resources that I might be able to review to see why my 
approach would not be valid ?


-- 
-Don 

Don Catanzaro, PhD                  
Landscape Ecologist
dgcatanzaro at gmail.com
16144 Sigmond Lane
Lowell, AR 72745
479-751-3616




More information about the R-help mailing list