[R] group factor levels

baptiste auguie baptiste.auguie at googlemail.com
Wed Feb 3 14:52:02 CET 2010


Dear list,

I cannot find an elegant solution to this problem. I have a factor f
containing several levels (5) and I wish to create a new factor of the
same length with fewer levels (2). This new factor should therefore
group together some levels of the original data. Ideally this grouping
would be at random, i.e I would not group together the first 2 levels
of f, then the following 3, etc.

Below is a minimal example (my real problem has more levels, otherwise
I would do the operation manually...)

f <- factor(rep(sample(letters[1:5], 20, repl=TRUE), each=10))

# permute the levels in random order
disorder <- sample(levels(f), length(levels(f)))

# new levels matching the old ones
new.lev <- rep(LETTERS[1:2], length=length(disorder))

# associate old levels to new ones
groups <- split(disorder, new.lev)

# test each element of f for its new category
test <- lapply(groups, function(g) f %in% g)

# f2 is the new factor, initialized with f
f2 <- as.character(f)

# recursively modify f2
sapply(seq_along(test), function(ii) f2[test[[ii]]] <<- names(test[ii]))

# make it a factor
f2 <- factor(f2)

Any suggestions are very welcome, I must have missed something more obvious!

Best regards,

baptiste



More information about the R-help mailing list