[R] Refactor all factors in a data frame

Hilmar Berger hilmar.berger at imise.uni-leipzig.de
Tue Jun 5 15:01:09 CEST 2007


Hi,

the best solution I found so far is (assuming <data> is your data.frame):

# identify all factor variables

factor.list = colnames(data)[sapply(data,class) == "factor"]

# use transform to apply factor() to all factor variables
trans.vars 
=paste(factor.list,"=factor(",factor.list,")",sep="",collapse="," )
data = eval(parse(text=paste("transform(data,",trans.vars,")")))

Regards,
Hilmar


Hilmar Berger schrieb:
> Hi all,
> 
> Assume I have a data frame with numerical and factor variables that I 
> got through merging various other data frames and subsetting the 
> resulting data frame afterwards. The number levels of the factors seem 
> to be the same as in the original data frames, probably because subset() 
> calls [.factor without drop = TRUE (that's what I gather from scanning 
> the mailing lists).
> 
> I wonder if there is a easy way to refactor all factors in the data 
> frame at once. I noted that fix(data_frame) does the trick, however, 
> this needs user interaction, which I'd like to avoid. Subsequent 
> write.table / read.table would be another option but I'm not sure if R 
> can guess the factor/char/numeric-type correctly when reading the table.
> 
> So, is there any way in drop the unused factor levels from *all* factors 
> of a data frame without import/export ?
> 
> Thanks in advance,
> Hilmar
> 


-- 

Hilmar Berger
Studienkoordinator
Institut für medizinische Informatik, Statistik und Epidemiologie
Universität Leipzig
Härtelstr. 16-18
D-04107 Leipzig

Tel. +49 341 97 16 101
Fax. +49 341 97 16 109
email: hilmar.berger at imise.uni-leipzig.de



More information about the R-help mailing list