[R] split data, but ensure each level of the factor is represented

Gabor Grothendieck ggrothendieck at gmail.com
Mon Oct 13 19:20:18 CEST 2008


Try this:

split(iris, factor(a, levels = 1:3))

On Mon, Oct 13, 2008 at 1:06 PM, Jay <wilcoxjay at gmail.com> wrote:
> Hello,
>
> I'll use part of the iris dataset for an example of what I want to
> do.
>
>> data(iris)
>> iris<-iris[1:10,1:4]
>> iris
>   Sepal.Length Sepal.Width Petal.Length Petal.Width
> 1           5.1         3.5          1.4         0.2
> 2           4.9         3.0          1.4         0.2
> 3           4.7         3.2          1.3         0.2
> 4           4.6         3.1          1.5         0.2
> 5           5.0         3.6          1.4         0.2
> 6           5.4         3.9          1.7         0.4
> 7           4.6         3.4          1.4         0.3
> 8           5.0         3.4          1.5         0.2
> 9           4.4         2.9          1.4         0.2
> 10          4.9         3.1          1.5         0.1
>
> Now if I want to split this data using the vector
>> a<-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3)
>> a
>  [1] 3 3 3 2 3 1 2 3 2 3
>
> Then the function split works fine
>> split(iris,a)
> $`1`
>  Sepal.Length Sepal.Width Petal.Length Petal.Width
> 6          5.4         3.9          1.7         0.4
>
> $`2`
>  Sepal.Length Sepal.Width Petal.Length Petal.Width
> 4          4.6         3.1          1.5         0.2
> 7          4.6         3.4          1.4         0.3
> 9          4.4         2.9          1.4         0.2
>
> $`3`
>   Sepal.Length Sepal.Width Petal.Length Petal.Width
> 1           5.1         3.5          1.4         0.2
> 2           4.9         3.0          1.4         0.2
> 3           4.7         3.2          1.3         0.2
> 5           5.0         3.6          1.4         0.2
> 8           5.0         3.4          1.5         0.2
> 10          4.9         3.1          1.5         0.1
>
>
> My problem is when the vector lacks one of the values from 1:n. For
> example if the vector is
>> a<-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3)
>> a
>  [1] 3 3 3 2 3 2 2 3 2 3
>
> then split will return a list without a $`1`. I would like to have the
> $`1` be a vector of 0's with the same length as the number of columns
> in the dataset. In other words I want to write a function that returns
>
>> mysplit(iris,a)
> $`1`
> [1] 0 0 0 0 0
>
> $`2`
>  Sepal.Length Sepal.Width Petal.Length Petal.Width
> 4          4.6         3.1          1.5         0.2
> 6          5.4         3.9          1.7         0.4
> 7          4.6         3.4          1.4         0.3
> 9          4.4         2.9          1.4         0.2
>
> $`3`
>   Sepal.Length Sepal.Width Petal.Length Petal.Width
> 1           5.1         3.5          1.4         0.2
> 2           4.9         3.0          1.4         0.2
> 3           4.7         3.2          1.3         0.2
> 5           5.0         3.6          1.4         0.2
> 8           5.0         3.4          1.5         0.2
> 10          4.9         3.1          1.5         0.1
>
> Thank you for your time,
>
> Jay
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list