[R] split data, but ensure each level of the factor is represented

Jay wilcoxjay at gmail.com
Mon Oct 13 19:18:36 CEST 2008

```Thanks so much.

On Oct 13, 1:14 pm, "Henrique Dallazuanna" <www... at gmail.com> wrote:
> Try this:
>
> a<-factor(c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3), levels = 1:3)
> split(iris, a)
>
> lapply(split(iris, a), dim)
>
>
>
> On Mon, Oct 13, 2008 at 2:06 PM, Jay <wilcox... at gmail.com> wrote:
> > Hello,
>
> > I'll use part of the iris dataset for an example of what I want to
> > do.
>
> > > data(iris)
> > > iris<-iris[1:10,1:4]
> > > iris
> >   Sepal.Length Sepal.Width Petal.Length Petal.Width
> > 1           5.1         3.5          1.4         0.2
> > 2           4.9         3.0          1.4         0.2
> > 3           4.7         3.2          1.3         0.2
> > 4           4.6         3.1          1.5         0.2
> > 5           5.0         3.6          1.4         0.2
> > 6           5.4         3.9          1.7         0.4
> > 7           4.6         3.4          1.4         0.3
> > 8           5.0         3.4          1.5         0.2
> > 9           4.4         2.9          1.4         0.2
> > 10          4.9         3.1          1.5         0.1
>
> > Now if I want to split this data using the vector
> > > a<-c(3, 3, 3, 2, 3, 1, 2, 3, 2, 3)
> > > a
> >  [1] 3 3 3 2 3 1 2 3 2 3
>
> > Then the function split works fine
> > > split(iris,a)
> > \$`1`
> >  Sepal.Length Sepal.Width Petal.Length Petal.Width
> > 6          5.4         3.9          1.7         0.4
>
> > \$`2`
> >  Sepal.Length Sepal.Width Petal.Length Petal.Width
> > 4          4.6         3.1          1.5         0.2
> > 7          4.6         3.4          1.4         0.3
> > 9          4.4         2.9          1.4         0.2
>
> > \$`3`
> >   Sepal.Length Sepal.Width Petal.Length Petal.Width
> > 1           5.1         3.5          1.4         0.2
> > 2           4.9         3.0          1.4         0.2
> > 3           4.7         3.2          1.3         0.2
> > 5           5.0         3.6          1.4         0.2
> > 8           5.0         3.4          1.5         0.2
> > 10          4.9         3.1          1.5         0.1
>
> > My problem is when the vector lacks one of the values from 1:n. For
> > example if the vector is
> > > a<-c(3, 3, 3, 2, 3, 2, 2, 3, 2, 3)
> > > a
> >  [1] 3 3 3 2 3 2 2 3 2 3
>
> > then split will return a list without a \$`1`. I would like to have the
> > \$`1` be a vector of 0's with the same length as the number of columns
> > in the dataset. In other words I want to write a function that returns
>
> > > mysplit(iris,a)
> > \$`1`
> > [1] 0 0 0 0 0
>
> > \$`2`
> >  Sepal.Length Sepal.Width Petal.Length Petal.Width
> > 4          4.6         3.1          1.5         0.2
> > 6          5.4         3.9          1.7         0.4
> > 7          4.6         3.4          1.4         0.3
> > 9          4.4         2.9          1.4         0.2
>
> > \$`3`
> >   Sepal.Length Sepal.Width Petal.Length Petal.Width
> > 1           5.1         3.5          1.4         0.2
> > 2           4.9         3.0          1.4         0.2
> > 3           4.7         3.2          1.3         0.2
> > 5           5.0         3.6          1.4         0.2
> > 8           5.0         3.4          1.5         0.2
> > 10          4.9         3.1          1.5         0.1
>
> > Thank you for your time,
>
> > Jay
>
> > ______________________________________________
> > R-h... at r-project.org mailing list
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help