[R] Refactor all factors in a data frame

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Jun 5 16:22:02 CEST 2007


On Tue, 5 Jun 2007, John Fox wrote:

> Dear Hilmar,
>
> You could use something like
>
> DF <- as.data.frame(lapply(DF, function (x) if (is.factor(x)) factor(x) else
> x))
>
> Where DF is the data frame.

I think DF[] <- lapply(DF, "[", drop=TRUE) is more likely to be what is 
wanted.  That drops factor levels without reordering the remaining 
levels, and would appear to be harmless for other variables.  But if one 
prefers

ind <- sapply(DF, is.factor)
DF[ind] <- lapply(DF[ind], "[", drop=TRUE)

Note the use of DF[] <- to preserve other attributes of DF, notably row 
names.

>
> I hope this helps,
> John
>
> --------------------------------
> John Fox, Professor
> Department of Sociology
> McMaster University
> Hamilton, Ontario
> Canada L8S 4M4
> 905-525-9140x23604
> http://socserv.mcmaster.ca/jfox
> --------------------------------
>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Hilmar Berger
>> Sent: Tuesday, June 05, 2007 8:20 AM
>> To: r-help at stat.math.ethz.ch
>> Subject: [R] Refactor all factors in a data frame
>>
>> Hi all,
>>
>> Assume I have a data frame with numerical and factor
>> variables that I got through merging various other data
>> frames and subsetting the resulting data frame afterwards.
>> The number levels of the factors seem to be the same as in
>> the original data frames, probably because subset() calls
>> [.factor without drop = TRUE (that's what I gather from
>> scanning the mailing lists).
>>
>> I wonder if there is a easy way to refactor all factors in
>> the data frame at once. I noted that fix(data_frame) does the
>> trick, however, this needs user interaction, which I'd like
>> to avoid. Subsequent write.table / read.table would be
>> another option but I'm not sure if R can guess the
>> factor/char/numeric-type correctly when reading the table.
>>
>> So, is there any way in drop the unused factor levels from
>> *all* factors of a data frame without import/export ?

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list