[R] List of Levels for all Factor variables

Wed Oct 17 17:29:23 CEST 2012

Given dat1, does this work?

> PrintLvls <- function(x) {print(data.frame(Lvls=sapply(x[sapply(x,
is.factor)],
+      nlevels), Names=sapply(x[sapply(x, is.factor)], 
+      function(y) paste0(levels(y), collapse=", "))), right=FALSE)
+ }
> PrintLvls(dat1)
     Lvls Names                          
col1 9    2, 6, 7, 10, 15, 16, 17, 23, 24
col2 7    b, c, d, e, g, h, j            
col3 5    1, 2, 3, 4, 5                  

It automatically extracts the columns that are factors so it should work on
your original data.frame.

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of arun
> Sent: Tuesday, October 16, 2012 12:09 PM
> To: Lopez, Dan
> Cc: R help
> Subject: Re: [R] List of Levels for all Factor variables
> 
> HI,
> You can also try this:
> set.seed(1)
> dat1<-
> data.frame(col1=factor(sample(1:25,10,replace=TRUE)),col2=sample(letter
> s[1:10],10,replace=TRUE),col3=factor(rep(1:5,each=2)))
> 
> sapply(lapply(mapply(c,lapply(names(sapply(dat1,levels)),function(x)
> x),sapply(dat1,levels)),function(x) paste(x[1],":",paste(x[-
> 1],collapse=" "))),print)
> #[1] "col1 : 2 6 7 10 15 16 17 23 24"
> #[1] "col2 : b c d e g h j"
> #[1] "col3 : 1 2 3 4 5"
> #[1] "col1 : 2 6 7 10 15 16 17 23 24" "col2 : b c d e g h j"
> #[3] "col3 : 1 2 3 4 5"
> 
> A.K.
> 
> 
> 
> 
> ----- Original Message -----
> From: "Lopez, Dan" <lopez235 at llnl.gov>
> To: "R help (r-help at r-project.org)" <r-help at r-project.org>
> Cc:
> Sent: Tuesday, October 16, 2012 11:19 AM
> Subject: [R] List of Levels for all Factor variables
> 
> Hi,
> 
> I want to get a clean succinct list of all levels for all my factor
> variables.
> 
> I have a dataframe that's something like #1 below. This is just an
> example subset of my data and my actual dataset has 70 variables. I
> know how to narrow down my list of variables to just my factor
> variables by using #2 below (thanks to Bert Gunter). I can also get
> list of all levels for all my factor variables using #3 below. But I
> what I want to find out is if there is a way to get this list in a
> similar fashion to what the str function returns: without all the extra
> spacing and carriage returns. That's what I mean by "clean succinct
> list".
> 
> BTW I also tried playing around with several of the parameters for the
> str function itself but could not find a way to accomplish what I want
> to accomplish.
> 
> 
> 
> 1.       DATAFRAME
> 
> > str(mydata)
> 'data.frame':  11868 obs. of  26 variables:
> $ EMPLID          : int  431108 32709 19730 10850 48786 2004 237628 558
> 3423 743175 ...
> $ NAME            : Factor w/ 6402 levels "Aaron Cathy E",..: 2777 242
> 161 104 336 4254 1595 1244 3669 4760 ...
> $ TRAIN           : int  1 1 1 1 1 1 1 1 1 1 ...
> $ TARGET          : int  0 0 0 0 0 0 0 0 0 0 ...
> $ APPT_TYP_CD_LL  : Factor w/ 3 levels "FX","IN","IP": 2 2 2 2 2 2 2 2
> 2 2 ...
> $ ORG_NAM_LL      : Factor w/ 18 levels "Business","Chief Financial
> Officer",..: 11 7 7 9 4 4 18 18 8 4 ...
> $ NEW_DISCIPLINE  : Factor w/ 15 levels "100s","300s",..: 14 6 4 1 11
> 11 14 2 1 1 ...
> $ SERIES          : Factor w/ 10 levels "100s","300s",..: 9 6 4 1 9 9 9
> 2 1 1 ...
> $ AGE             : int  62 53 46 62 55 59 50 36 34 53 ...
> $ SERVICE         : int  13 29 16 26 18 9 19 11 8 26 ...
> $ AGE_SERVICE     : int  75 82 62 87 73 69 69 47 42 79 ...
> $ HIEDUCLV        : Factor w/ 6 levels "Associate","Bachelor",..: 5 6 6
> 6 5 2 3 2 2 1 ...
> $ GENDER          : Factor w/ 2 levels "F","M": 2 2 2 1 2 2 2 2 2 1 ...
> $ RETCD           : Factor w/ 2 levels "TCP1","TCP2": 2 1 2 2 2 1 1 2 1
> 2 ...
> $ FLSASTATUS      : Factor w/ 2 levels "E","N": 1 2 2 1 1 1 1 1 1 1 ...
> $ MONTHLY_RT      : int  17640 6932 5845 9809 11473 8719 19190 8986
> 7231 6758 ...
> $ RETSTATUSDERIVED: Factor w/ 4 levels "401K","DOUBLE DIPPERS",..: 2 4
> 3 2 3 4 4 3 4 3 ...
> $ ETHNIC_GRP_CD   : Factor w/ 8 levels "AMIND","ASIAN",..: 8 8 8 8 8 8
> 8 8 8 8 ...
> $ COMMUTE_BIN     : Factor w/ 7 levels "","<15","15 - 24",..: 5 7 2 2 4
> 3 3 6 3 2 ...
> $ EEO_CLASS       : Factor w/ 4 levels "M","S1","S2",..: 1 2 4 4 4 4 1
> 2 4 2 ...
> $ WRK_SCHED       : Factor w/ 6 levels "12HR","4/10s",..: 3 3 3 3 3 3 3
> 3 4 4 ...
> $ FWT_MAR_STATUS  : Factor w/ 2 levels "M","S": 1 1 1 1 2 1 1 1 1 2 ...
> $ COVERED_DP      : int  2 2 4 0 1 3 1 2 0 0 ...
> $ YRS_IN_SERIES   : int  13 29 16 26 18 9 19 3 7 26 ...
> $ SAVINGS_PCT     : int  10 0 6 19 8 0 10 15 15 18 ...
> $ Generation      : Factor w/ 4 levels "Baby Boomers",..: 1 1 2 1 1 1 1
> 2 2 1 ...
> 
> 2. Create mydataF to only include factor variables (and exclude NAME
> which I am not interested in)
> 
> > mydataF<-mydata[,sapply(mydata,function(x)is.factor(x))][,-1]
> 
> 3. Get a list of all levels
> 
> > sapply(mydataF,function(x)levels(x))
> 
> $APPT_TYP_CD_LL
> 
> [1] "FX" "IN" "IP"
> 
> 
> 
> $ORG_NAM_LL
> 
> [1] "Business"                        "Chief Financial Officer"
> "Chief Information Office"        "Computation"
> "Engineering"                     "ESH and Quality"
> 
> [7] "Facilities and Infrastructure"   "Global Security"
> "NIF"          "NO"              "Office of the Director"
> "Operations and Business Office"
> 
> [13] "Physical and Life Sciences"      "Planning and Financial
> Services" "ST"   "Security Organization"           "Strategic Human
> Resources Mgmt"  "WCI"
> 
> 
> 
> $NEW_DISCIPLINE
> 
> [1] "100s"                       "300s"                       "400s"
>                    "500s"                       "600s"
>      "800s"                       "900s"
> 
> [8] "Chem  Science"              "Engineering"                "Life
> Sciences"              "Math  Computer Science  IT" "Physics"
>           "pre100s"                    "PSTS Other"
> 
> [15] "Re"
> 
> 
> 
> $SERIES   ......
> 
> Daniel Lopez
> Workforce Analyst
> HRIM - Workforce Analytics & Metrics
> 
> 
>     [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.