[R] convert columns of dataframe to same factor levels

William Dunlap wdunl@p @ending from tibco@com
Wed Dec 19 18:50:35 CET 2018


You can abuse the S4 class system to do this.

setClass("Size") # no representation, no prototype
setAs(from="character", to="Size", # nothing but a coercion method
  function(from){
    ret <- factor(from, levels=c("Small","Medium","Large"), ordered=TRUE)
    class(ret) <- c("Size", class(ret))
    ret
  })
z <- read.table(colClasses=c("integer", "Size"), text="7 Medium\n5 Large\n3
Large")
dput(z)
#structure(list(V1 = c(7L, 5L, 3L), V2 = structure(c(2L, 3L, 3L
#), .Label = c("Small", "Medium", "Large"), class = c("Size",
#"ordered", "factor"))), class = "data.frame", row.names = c(NA,
#-3L))

I wonder if this behavior is intended or if there is a more sanctioned way
to get read.table(colClasses=...) to make a factor with a specified set of
levels.

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Wed, Dec 19, 2018 at 3:19 AM Duncan Murdoch <murdoch.duncan using gmail.com>
wrote:

> On 19/12/2018 5:58 AM, Luigi Marongiu wrote:
> > Dear all,
> > I have a data frame with character values where each character is a
> > level; however, not all columns of the data frame have the same
> > characters thus, when generating the data frame with stringsAsFactors
> > = TRUE, the levels are different for each column.
> > Is there a way to provide a single vector of levels and assign the
> > characters so that they match such vector?
> > Is there a way to do that not only when setting the data frame but
> > also when reading data from a file with read.table()?
> >
> > For instance, I have:
> > column_1 = c("A", "B", "C", "D", "E")
> > column_2 = c("B", "B", "C", "E", "E")
> > column_3 = c("C", "C", "D", "D", "C")
> > my.data <- data.frame(column_1, column_2, column_3, stringsAsFactors =
> TRUE)
> >> str(my.data)
> > 'data.frame': 5 obs. of  3 variables:
> >   $ column_1: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
> >   $ column_2: Factor w/ 3 levels "B","C","E": 1 1 2 3 3
> >   $ column_3: Factor w/ 2 levels "C","D": 1 1 2 2 1
> >
> > Thank you
> >
>
> I don't think read.table() can do it for you automatically.  To do it
> yourself, you need to get a vector of the levels.  If you know this,
> just assign it to a variable; if you don't know it, compute it as
>
>    thelevels <- unique(unlist(lapply(my.data, levels)))
>
> Then set the levels of each column to thelevels:
>
>    my.data.new <- as.data.frame(lapply(my.data, function(x) {levels(x)
> <- thelevels; x}))
>
> Duncan Murdoch
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list