[R] List of Levels for all Factor variables

Tue Oct 16 18:12:03 CEST 2012

Perfect!
Thank you!
Dan

-----Original Message-----
From: Rui Barradas [mailto:ruipbarradas at sapo.pt] 
Sent: Tuesday, October 16, 2012 9:03 AM
To: Lopez, Dan
Cc: R. Michael Weylandt; R help (r-help at r-project.org)
Subject: Re: [R] List of Levels for all Factor variables

Hello,

The problem is with "clean"?

dat <- data.frame(X = sample(letters[1:4], 100, TRUE),
             Y = sample(LETTERS[1:6], 100, TRUE),
             Z = factor(rep(1:5, 4)))

levs <- lapply(dat, levels)
clean <- lapply(seq_along(levs), function(i)
     paste(names(levs)[i], ":",  paste(levs[[i]], collapse = " ")))

sapply(clean, print)

Hope this helps,

Rui Barradas
Em 16-10-2012 16:40, Lopez, Dan escreveu:
> Using unlist() did not produce the result I wanted. I have a dataframe. I tried playing with the parameters of unlist but each time it just tried to return each observation.
>
> unlist(x, recursive = TRUE, use.names = TRUE)
>
> Dan
>
> -----Original Message-----
> From: R. Michael Weylandt [mailto:michael.weylandt at gmail.com]
> Sent: Tuesday, October 16, 2012 8:28 AM
> To: Lopez, Dan
> Cc: R help (r-help at r-project.org)
> Subject: Re: [R] List of Levels for all Factor variables
>
> On Tue, Oct 16, 2012 at 4:19 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
>> Hi,
>>
>> I want to get a clean succinct list of all levels for all my factor variables.
>>
>> I have a dataframe that's something like #1 below. This is just an example subset of my data and my actual dataset has 70 variables. I know how to narrow down my list of variables to just my factor variables by using #2 below (thanks to Bert Gunter). I can also get list of all levels for all my factor variables using #3 below. But I what I want to find out is if there is a way to get this list in a similar fashion to what the str function returns: without all the extra spacing and carriage returns. That's what I mean by "clean succinct list".
>>
>> BTW I also tried playing around with several of the parameters for the str function itself but could not find a way to accomplish what I want to accomplish.
>>
>>
>>
>> 1.       DATAFRAME
>>
>>> str(mydata)
>> 'data.frame':  11868 obs. of  26 variables:
>> $ EMPLID          : int  431108 32709 19730 10850 48786 2004 237628 558 3423 743175 ...
>> $ NAME            : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242 161 104 336 4254 1595 1244 3669 4760 ...
>> $ TRAIN           : int  1 1 1 1 1 1 1 1 1 1 ...
>> $ TARGET          : int  0 0 0 0 0 0 0 0 0 0 ...
>> $ APPT_TYP_CD_LL  : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2 2 2 ...
>> $ ORG_NAM_LL      : Factor w/ 18 levels "Business","Chief Financial Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
>> $ NEW_DISCIPLINE  : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11 11 14 2 1 1 ...
>> $ SERIES          : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9 2 1 1 ...
>> $ AGE             : int  62 53 46 62 55 59 50 36 34 53 ...
>> $ SERVICE         : int  13 29 16 26 18 9 19 11 8 26 ...
>> $ AGE_SERVICE     : int  75 82 62 87 73 69 69 47 42 79 ...
>> $ HIEDUCLV        : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6 6 5 2 3 2 2 1 ...
>> $ GENDER          : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
>> $ RETCD           : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1 2 ...
>> $ FLSASTATUS      : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
>> $ MONTHLY_RT      : int  17640 6932 5845 9809 11473 8719 19190 8986 7231 6758 ...
>> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4 3 2 3 4 4 3 4 3 ...
>> $ ETHNIC_GRP_CD   : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8 8 8 8 8 ...
>> $ COMMUTE_BIN     : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4 3 3 6 3 2 ...
>> $ EEO_CLASS       : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1 2 4 2 ...
>> $ WRK_SCHED       : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3 3 4 4 ...
>> $ FWT_MAR_STATUS  : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
>> $ COVERED_DP      : int  2 2 4 0 1 3 1 2 0 0 ...
>> $ YRS_IN_SERIES   : int  13 29 16 26 18 9 19 3 7 26 ...
>> $ SAVINGS_PCT     : int  10 0 6 19 8 0 10 15 15 18 ...
>> $ Generation      : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1 2 2 1 ...
>>
>> 2. Create mydataF to only include factor variables (and exclude NAME which I am not interested in)
>>
>>> mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
>> 3. Get a list of all levels
>>
>>> sapply(mydataF,function(x)levels(x))
> I think you want to unlist() the result of this call.
>
> RMW
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.