[R] levels of comma separated data

Fri May 25 15:33:43 CEST 2012

On May 25, 7:23 am, "analys... at hotmail.com" <analys... at hotmail.com>
wrote:
> On May 25, 4:46 am, Stefan <ste... at inizio.se> wrote:
>
>
>
>
>
> > analyst41 <at> hotmail.com <analyst41 <at> hotmail.com> writes:
>
> > > I have a data set that has some comma separated strings in each row.
> > > I'd like to create a vector consisting of all distinct strings that
> > > occur.  The number of strings in each row may vary.
>
> > > Thanks for any help.
>
> > #
> > #
> > # Some data:
> > d <- data.frame(id = 1:5,
> >   text = c('one,two',
> >     'two,three,three,four',
> >     'one,three,three,five',
> >     'five,five,five,five',
> >     'one,two,three'),
> >   stringsAsFactors = FALSE
> > )
> > #
> > #
> > # A function. I'm not a black belt at this, so there
> > # are probably a more efficient way of writing this.
> > fcn <- function(x){
> >   a <- strsplit(x, ',') # Split the string by comma
> >   unique(a[[1]]) # Uniquify the vector}
>
> > #
> > #
> > # Use the function with sapply.
> > sapply(d[,2], fcn)
>
> Thanks - but this solves a slightly different problem - it outputs the
> unique values in each row.  I want a list of the unique values in the
> whole data frame.
>
> In this case the output should be a single vector =
>  c("one","two","three","four","five").
>

Actually I figured it out after I posted this:

> levels(as.factor(unlist(strsplit(d$text,','))))
[1] "five"  "four"  "one"   "three" "two"

Thanks for pointing me the right way.