[R] levels of comma separated data

analyst41 at hotmail.com analyst41 at hotmail.com
Fri May 25 13:23:02 CEST 2012



On May 25, 4:46 am, Stefan <ste... at inizio.se> wrote:
> analyst41 <at> hotmail.com <analyst41 <at> hotmail.com> writes:
>
>
>
> > I have a data set that has some comma separated strings in each row.
> > I'd like to create a vector consisting of all distinct strings that
> > occur.  The number of strings in each row may vary.
>
> > Thanks for any help.
>
> #
> #
> # Some data:
> d <- data.frame(id = 1:5,
>   text = c('one,two',
>     'two,three,three,four',
>     'one,three,three,five',
>     'five,five,five,five',
>     'one,two,three'),
>   stringsAsFactors = FALSE
> )
> #
> #
> # A function. I'm not a black belt at this, so there
> # are probably a more efficient way of writing this.
> fcn <- function(x){
>   a <- strsplit(x, ',') # Split the string by comma
>   unique(a[[1]]) # Uniquify the vector}
>
> #
> #
> # Use the function with sapply.
> sapply(d[,2], fcn)
>


Thanks - but this solves a slightly different problem - it outputs the
unique values in each row.  I want a list of the unique values in the
whole data frame.

In this case the output should be a single vector =
 c("one","two","three","four","five").


> ______________________________________________
> R-h... at r-project.org mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list