[R] ROCR data input

Claudia Beleites cbeleites at units.it
Tue Aug 17 15:12:27 CEST 2010


Anneley,

> Sorry, I'm new to R, and relatively new to statistics too so I'm still a bit
> unclear.
That's OK - everyone started some time and was new.

However, it is really important to post a reproducible example here. If you are 
so new that you don't know how to do that exactly, you should probably write 
into your email that you tried but don't know how to do. Your chances to get an 
answer will probably increase quite a bit by that.

Also, I'd suggest you to go thoroughly through some introduction for R. There's 
a lot available on cran, the web and in many libraries.
E.g. a collection divided into more or less than 100 pages
http://cran.r-project.org/other-docs.html
r-project.org also has links to books, and to non-english material.

> The values in the post were only a sample of around 8400 rows. The
> label has 1 or 0 (I thought this was the two classes needed).
yes.

> Each label row
> has an equivalent probability. This is the data that I output from the
> logistic regression analysis, but it is seemingly not the right format for
> ROC curve analysis.
It is the right format.

> There is a difference in how R displays the data, when I
> type ROCR.simple it is in the format:
> 
> $predictions
>   [1] 0.612547843 0.364270971 0.432136142.......
> $labels
>   [1] 1 1 0 0 0 1 1 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 ... etc.
> 
> whereas mine is in columns, e.g.
> 
> ID, labels, probs
> 8930     0 0.00070
> 8931     0 0.00036
> 8932     1 0.00000
> 8933     1 0.00002
> 8934     0 0.00001
> etc.
Look up the difference between list and data.frame.
Also: you can find out a lot about variables with class () and str (), and maybe 
  summary ()

> That is why I think it is a format issue, but being new to R, I'm not sure
> what I need to do to rectify it.
>  I have attached the text file if this helps.
No, we don't need it to reproduce your error - I think it's all more or less 
about typos:

 > prediction("prob$probabilities", "prob$label")
Error in prediction("prob$probabilities", "prob$label") :
   Number of classes is not equal to 2.
ROCR currently supports only evaluation of binary classification tasks.

Now, if you need to trace down such an error, it is really a good idea to check 
what the arguments are that you hand over:

As many errors come from typos, it is a good idea to copy and paste literally 
what you put into the function:
 > "prob$probabilities"
[1] "prob$probabilities"
 > "prob$label"
[1] "prob$label"

See the difference between what your argument evaluates to and
what you thought to hand over?

Does this get you on the right track? I don't want to be nasty, but if you 
discover the mistakes yourself, you'll be much faster finding such things next time.

So: try with these hints, and if it doesn't work, you can ask again.

HTH,

Claudia
-- 
Claudia Beleites
Dipartimento dei Materiali e delle Risorse Naturali
Università degli Studi di Trieste
Via Alfonso Valerio 6/a
I-34127 Trieste

phone: +39 0 40 5 58-37 68
email: cbeleites at units.it



More information about the R-help mailing list