[R] Having trouble converting a dataframe of character vectors to factors

Bert Gunter gunter.berton at gene.com
Thu Feb 21 01:24:50 CET 2013


Pleaser re-read ?sapply and pay particular attention to the "simplify" argument.

The following should help explain the issues:

> z <- data.frame(a=letters[1:3],b=letters[4:6],stringsAsFactors=FALSE)
> sapply(z,class)
          a           b
"character" "character"
> z1 <- sapply(z,as.factor)
> sapply(z1,class)
          a           b           c           d           e           f
"character" "character" "character" "character" "character" "character"
> z2 <- sapply(z,factor, simplify = FALSE)
> sapply(z2,class)
       a        b
"factor" "factor"
> z3 <- lapply(z,factor)
> sapply(z3,class)
       a        b
"factor" "factor"
> z3
$a
[1] a b c
Levels: a b c

$b
[1] d e f
Levels: d e f

## Note that both z2 and z3 are lists, and would have to be converted
back to data frames.

-- Bert

On Wed, Feb 20, 2013 at 4:09 PM, Lopez, Dan <lopez235 at llnl.gov> wrote:
> R Experts,
>
> I have a dataframe made up of character vectors--these are results from survey questions. I need to convert them to factors.
>
> I tried the following which did not work:
> scs2<-sapply(scs2,as.factor)
> also this didn't work:
> scs2<-sapply(scs2,function(x) as.factor(x))
>
> After doing either of above I end up with
>>str(scs2)
>
> chr [1:10, 1:10] "very important" "very important" "very important" "very important" ...
>
>  - attr(*, "dimnames")=List of 2
>
>   ..$ : NULL
>
>   ..$ : chr [1:10] "Q1_1" "Q1_2" "Q1_3" "Q1_4" ...
>
>>class(scs2)
> "matrix"
>
> But when I do it one at a time it works:
> scs2$Q1_1<-as.factor(scs2$Q1_1)
> scs2$Q1_2<- as.factor(scs2$Q1_2)
>
> What am I doing wrong?  How do I accomplish this with sapply or similar function?
>
> Data for reproducibility:
>
>
> scs2<-structure(list(Q1_1 = c("very important", "very important", "very important",
>
> "very important", "very important", "very important", "very important",
>
> "somewhat important", "important", "very important"), Q1_2 = c("important",
>
> "somewhat important", "very important", "important", "important",
>
> "very important", "somewhat important", "somewhat important",
>
> "very important", "very important"), Q1_3 = c("very important",
>
> "important", "very important", "very important", "important",
>
> "very important", "very important", "somewhat important", "not important",
>
> "important"), Q1_4 = c("very important", "important", "very important",
>
> "very important", "important", "important", "important", "very important",
>
> "somewhat important", "important"), Q1_5 = c("very important",
>
> "not important", "important", "very important", "not important",
>
> "important", "somewhat important", "important", "somewhat important",
>
> "not important"), Q1_6 = c("very important", "not important",
>
> "important", "very important", "somewhat important", "very important",
>
> "very important", "very important", "important", "important"),
>
>     Q1_7 = c("very important", "somewhat important", "important",
>
>     "somewhat important", "important", "important", "very important",
>
>     "very important", "somewhat important", "not important"),
>
>     Q2 = c("Somewhat", "Very Much", "Somewhat", "Very Much",
>
>     "Very Much", "Very Much", "Very Much", "Very Much", "Very Much",
>
>     "Very Much"), Q3 = c("yes", "yes", "yes", "yes", "yes", "yes",
>
>     "yes", "yes", "yes", "yes"), Q4 = c("None", "None", "None",
>
>     "None", "Confirmed Field of Study", "Confirmed Field of Study",
>
>     "Confirmed Field of Study", "None", "None", "None")), .Names = c("Q1_1",
>
> "Q1_2", "Q1_3", "Q1_4", "Q1_5", "Q1_6", "Q1_7", "Q2", "Q3", "Q4"
>
> ), row.names = c(78L, 46L, 80L, 196L, 188L, 197L, 39L, 195L,
>
> 172L, 110L), class = "data.frame")
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



More information about the R-help mailing list