[R] calibration/validation sets

Kevin Wang Kevin.Wang at maths.anu.edu.au
Sun Aug 15 02:48:54 CEST 2004


Hi,

On Sat, 14 Aug 2004, Peyuco Porras Porras . wrote:

> Hi;
> Does anyone know how to create a calibration and validation set from a particular dataset? I have a dataframe with nearly 20,000 rows! and I would like to select (randomly) a subset from the original dataset (...I found how to do that) to use as calibration set. However, I don't know how to remove this "calibration" set from the original dataframe in order to get my "validation" set.....Any hint will be greatly appreciated.

A really quick way, suppose you want to have 30% of your dataset as the
validation set:
> iris.id = sample(nrow(iris), nrow(iris) * 0.3)
> iris.valid = iris[iris.id, ]
> iris.train = iris[-iris.id, ]
> nrow(iris.valid)
[1] 45
> nrow(iris.train)
[1] 105

The first line takes a sample of 30% of the number of rows in the Iris
data.  The second line does a subetting of those samples -- the validation
set.  The third takes what's left -- the training set.  This is perhaps
not efficient and the code can definitely be simplified...but it's Sunday
morning and I haven't had my morning coffee yet :D

Cheers,

Kevin


--------------------------------
Ko-Kang Kevin Wang
PhD Student
Centre for Mathematics and its Applications
Building 27, Room 1004
Mathematical Sciences Institute (MSI)
Australian National University
Canberra, ACT 0200
Australia
Homepage: http://wwwmaths.anu.edu.au/~wangk/
Ph (W): +61-2-6125-2431
Ph (H): +61-2-6125-7407
Ph (M): +61-40-451-8301




More information about the R-help mailing list