[R] createDataPartition

Max Kuhn mxkuhn at gmail.com
Thu Sep 9 19:56:31 CEST 2010


Trafim,

You'll get more answers if you adhere to the posting guide and tell us
you version information and other necessary details. For example, this
function is in the caret package (but nobody but me probably knows
that =]).

The first argument should be a vector of outcome values (not the
possible classes).

For the iris data, this means something like:

   createDataPartition(iris$Species)

if you were trying to predict the species. The function does
stratified splitting; the data are split into training and test sets
within each class, then the results are aggregated to get the entire
training set indicators. Setting a proportion per class won't do
anything.

Look at the man page or the (4) package vignettes for examples.

Max

On Thu, Sep 9, 2010 at 7:52 AM, Trafim Vanishek <rdapamoga at gmail.com> wrote:
> Dear all,
>
> does anyone know how to define the structure of the required samples using
> function createDataPartition, meaning proportions of different types of
> variable in the partition?
> Smth like this for iris data:
>
> createDataPartition(y = c(setosa = .5, virginica = .3, versicolor = .2),
> times = 10, p = .7, list = FALSE)
>
> Thanks a lot for your help.
>
> Regards,
> Trafim
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 

Max



More information about the R-help mailing list