[R] Apply as.factor (or as.numeric etc) to multiple columns

Gabor Grothendieck ggrothendieck at gmail.com
Thu Jun 25 13:41:25 CEST 2009


That's quite nice.   Three comments:

- colClasses() in R.utils is similar, except for the particular
codes and classes supported, to expandClasses() here.

- not sure if this is important but if as() were the last
possibility tried rather than the first then in most
cases (in fact all cases handled by expandClasses() )
there would be no use of the methods package.

- paste("as", ...) handles all the common cases including
all cases handled by expandClasses() except NA_character_
and could be used as a poor man's doCoerce().

On Thu, Jun 25, 2009 at 3:43 AM, Bengoechea Bartolomé Enrique (SIES
73)<enrique.bengoechea at credit-suisse.com> wrote:
> Hi Mark,
>
> I frequently need to do that when importing data. This one-liner works:
>
>> data.frame(mapply(as, x, c("integer", "character", "factor"), SIMPLIFY=FALSE), stringsAsFactors=FALSE);
>
> but it has two problems:
>
> 1) as() is an S4 method that does not always work
> 2) writting the vector of classes for 60 variables is rather tedious.
>
> Both issues can be solved with the following two helper functions. The first function tries to use as(x, class); if it doesn't work, tries as.<class>(x); If it still doesn't work, tries <class>(x). The second function tranforms a single string to a character vector of classes, by transforming each letter in the string to a class name (i.e. "D" is tranformed to "Date", "i" to "integer", etc.), so that writting 60 classes is fast.
>
> doCoerce <- function(x, class) {
>        if (canCoerce(x, class))
>                as(x, class)
>        else {
>                result <- try(match.fun(paste("as", class, sep="."))(x), silent=TRUE);
>                if (inherits(result, "try-error"))
>                        result <- match.fun(class)(x)
>                result;
>    }
> }
>
> expandClasses <- function (x) {
>    unknowns <- character(0)
>    result <- lapply(strsplit(as.character(x), NULL, fixed = TRUE),
>        function(y) {
>            sapply(y, function(z) switch(z,
>                        i = "integer", n = "numeric",
>                l = "logical", c = "character", x = "complex",
>                r = "raw", f = "factor", D = "Date", P = "POSIXct",
>                t = "POSIXlt", N = NA_character_, {
>                  unknowns <<- c(unknowns, z)
>                  NA_character_
>                }), USE.NAMES = FALSE)
>        })
>    if (length(unknowns)) {
>        unknowns <- unique(unknowns)
>        warning(sprintf(ngettext(length(unknowns), "code %s not recognized",
>            "codes %s not recognized"), dqMsg(unknowns)))
>    }
>    result
> }
>
> An example:
>
>> x <- data.frame(X="2008-01-01", Y=1.1:3.1, Z=letters[1:3])
>> data.frame(mapply(doCoerce, x, expandClasses("Dif")[[1L]], SIMPLIFY=FALSE), stringsAsFactors=FALSE);
>
> Regards,
>
> Enrique
>
>
> ------------------------------
>
> Message: 99
> Date: Tue, 23 Jun 2009 15:23:54 -0600
> From: Mark Na <mtb954 at gmail.com>
> Subject: [R] Apply as.factor (or as.numeric etc) to multiple columns
> To: r-help at r-project.org
> Message-ID:
>        <e40d78ce0906231423m4c3da14i2f6270f92463c943 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi R-helpers,
>
> I have a dataframe with 60columns and I would like to convert several
> columns to factor, others to numeric, and yet others to dates. Rather
> than having 60 lines like this:
>
> data$Var1<-as.factor(data$Var1)
>
> I wonder if it's possible to write one line of code (per data type,
> e.g. factor) that would apply a function (e.g., as.factor) to several
> (non-contiguous) columns. So, I could then use 3 or 4 lines of code
> (for 3 or 4 data types) instead of 60.
>
> I have tried writing an apply function, but it failed.
>
> Thanks for any help you might be able to provide.
>
> Mark Na
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>




More information about the R-help mailing list