[R] convert columns of dataframe to same factor levels

Duncan Murdoch murdoch@dunc@n @ending from gm@il@com
Wed Dec 19 14:01:47 CET 2018


On 19/12/2018 6:48 AM, Luigi Marongiu wrote:
> Thank you,
> that worked fine for me.
> Best wishes of merry Christmas and happy new year,
> Luigi
> 

Actually it's wrong!  Sorry about that.

If you look at my.data.new$column_2, you'll see that the levels have 
changed:

 > my.data
   column_1 column_2 column_3
1        A        B        A
2        B        B        A
3        C        C        B
4        D        E        B
5        E        E        A


 > my.data.new
   column_1 column_2 column_3
1        A        A        A
2        B        A        A
3        C        B        B
4        D        C        B
5        E        C        A

What you want is this instead:

my.data.new <- as.data.frame(lapply(my.data, function(x) {factor(x, 
levels = thelevels)}))

The last example in the ?levels help page does this too.  I wonder if 
that is intentional?

levels> ## we can add levels this way:
levels> f <- factor(c("a","b"))

levels> levels(f) <- c("c", "a", "b")

levels> f
[1] c a
Levels: c a b

levels> f <- factor(c("a","b"))

levels> levels(f) <- list(C = "C", A = "a", B = "b")

levels> f
[1] A B
Levels: C A B

Duncan Murdoch

> On Wed, Dec 19, 2018 at 12:19 PM Duncan Murdoch
> <murdoch.duncan using gmail.com> wrote:
>>
>> On 19/12/2018 5:58 AM, Luigi Marongiu wrote:
>>> Dear all,
>>> I have a data frame with character values where each character is a
>>> level; however, not all columns of the data frame have the same
>>> characters thus, when generating the data frame with stringsAsFactors
>>> = TRUE, the levels are different for each column.
>>> Is there a way to provide a single vector of levels and assign the
>>> characters so that they match such vector?
>>> Is there a way to do that not only when setting the data frame but
>>> also when reading data from a file with read.table()?
>>>
>>> For instance, I have:
>>> column_1 = c("A", "B", "C", "D", "E")
>>> column_2 = c("B", "B", "C", "E", "E")
>>> column_3 = c("C", "C", "D", "D", "C")
>>> my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors = TRUE)
>>>> str(my.data)
>>> 'data.frame': 5 obs. of  3 variables:
>>>    $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
>>>    $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
>>>    $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1
>>>
>>> Thank you
>>>
>>
>> I don't think read.table() can do it for you automatically.  To do it
>> yourself, you need to get a vector of the levels.  If you know this,
>> just assign it to a variable; if you don't know it, compute it as
>>
>>     thelevels <- unique(unlist(lapply(my.data, levels)))
>>
>> Then set the levels of each column to thelevels:
>>
>>     my.data.new <- as.data.frame(lapply(my.data, function(x) {levels(x)
>> <- thelevels; x}))
>>
>> Duncan Murdoch
> 
> 
>



More information about the R-help mailing list