[R] convert columns of dataframe to same factor levels

Duncan Murdoch murdoch@dunc@n @ending from gm@il@com
Wed Dec 19 12:19:06 CET 2018


On 19/12/2018 5:58 AM, Luigi Marongiu wrote:
> Dear all,
> I have a data frame with character values where each character is a
> level; however, not all columns of the data frame have the same
> characters thus, when generating the data frame with stringsAsFactors
> = TRUE, the levels are different for each column.
> Is there a way to provide a single vector of levels and assign the
> characters so that they match such vector?
> Is there a way to do that not only when setting the data frame but
> also when reading data from a file with read.table()?
> 
> For instance, I have:
> column_1 = c("A", "B", "C", "D", "E")
> column_2 = c("B", "B", "C", "E", "E")
> column_3 = c("C", "C", "D", "D", "C")
> my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE)
>> str(my.data)
> 'data.frame': 5 obs. of  3 variables:
>   $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
>   $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
>   $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1
> 
> Thank you
> 

I don't think read.table() can do it for you automatically.  To do it 
yourself, you need to get a vector of the levels.  If you know this, 
just assign it to a variable; if you don't know it, compute it as

   thelevels <- unique(unlist(lapply(my.data, levels)))

Then set the levels of each column to thelevels:

   my.data.new <- as.data.frame(lapply(my.data, function(x) {levels(x) 
<- thelevels; x}))

Duncan Murdoch



More information about the R-help mailing list