[R] Recoding factor labels that are lists into first element of list

Gabor Grothendieck ggrothendieck at gmail.com
Fri Dec 11 11:43:11 CET 2009


Or this which removing the comma and everything thereafter in each
level that has a comma:

levels(x$a) <- sub(",.*", "", levels(x$a))

On Fri, Dec 11, 2009 at 5:21 AM, jim holtman <jholtman at gmail.com> wrote:
> try this:
>
>> x <- data.frame(a=c('cat', 'cat,dog', 'dog', 'dog,cat'))
>> x
>        a
> 1     cat
> 2 cat,dog
> 3     dog
> 4 dog,cat
>> levels(x$a)
> [1] "cat"     "cat,dog" "dog"     "dog,cat"
>> # change the factors
>> x$a <- factor(sapply(strsplit(as.character(x$a), ','), '[[', 1))
>> x
>    a
> 1 cat
> 2 cat
> 3 dog
> 4 dog
>> levels(x$a)
> [1] "cat" "dog"
>
>
> On Thu, Dec 10, 2009 at 10:53 PM, Jennifer Walsh <walshjen at umich.edu> wrote:
>
>> Hi all,
>>
>> I've Googled far and wide but don't think I know the correct terms to
>> search for to find an answer.
>>
>> I have a massive dataset where one of the factors is made up of both
>> individual items and lists of items (for example, "cat" and "cat, dog,
>> bird"). I would like to recode this factor somehow into only the first
>> element of the list (so every list starting with "cat," plus the
>> observations that were already just "cat" would all be set equal to "cat").
>> I would ideally like to do this in some simple way that does not require me
>> to write hundreds of different sets of code (since the lists probably start
>> with 300+ different items). Is this possible? Extremely complicated?
>>
>> Also, I am sure this is much simpler, but I cannot seem to get rid of
>> levels of a factor that have no observations. I have tried setting the
>> levels of the factor to only the ones with observations that I am interested
>> in, but every time I summarize the variable there are still 100+ labels all
>> with "0" as their count. This hasn't happened to me before; is there an
>> explanation for it?
>>
>> Thanks very much,
>> Jen
>>
>> ---
>> Jennifer Walsh
>> Graduate Student, Developmental Psychology
>> University of Michigan
>> 2020 East Hall, 530 Church St.
>> Ann Arbor, MI 48109-1043
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list