[R] Convert list to data frame while controlling column types

Alexander Shenkin ashenkin at ufl.edu
Fri Aug 21 21:41:43 CEST 2009


Thanks everyone for their replies, both on- and off-list.  I should
clarify, since I left out some important information.  My original
dataframe has some numeric columns, which get changed to character by
gsub when I replace spaces with NAs.  Thus, in going back to a
dataframe, those (now character) columns get converted to factors.  I
recently added stringsAsFactors to get characters to make things a bit
easier.  I wrote the column-type reset function below, but it feels
kludgey, so was wondering if there was some other way to specify how one
might want as.data.frame to handle the columns. 

str(final_dataf)
'data.frame':   1127 obs. of  43 variables:
 $ block      : Factor w/ 1 level "2": 1 1 1 1 1 1 1 1 1 1 ...
 $ treatment  : Factor w/ 4 levels "I","M","N","T": 1 1 1 1 1 1 1 1 1 1 ...
 $ transect   : Factor w/ 1 level "4": 1 1 1 1 1 1 1 1 1 1 ...
 $ tag        : chr  NA "121AL" "122AL" "123AL" ...
...
 $ h1         : num  NA NA NA NA NA NA NA NA NA NA ...
...

reset_col_types <- function (df, col_types) {
    # Function to reset column types in dataframes.  col_types can be
constructed
    # by using lapply(class,df)

    coerce_fun = list (
        "character"   = `as.character`,
        "factor"      = `as.factor`,
        "numeric"     = `as.numeric`,
        "integer"     = `as.integer`,
        "POSIXct"     = `as.POSIXct`,
        "logical"     = `as.logical` )

    for (i in 1:length(df)) {
        df[,i] = coerce_fun[[ col_types[i] ]]( df[,i] ) #apply coerce
function
    }
    return(df)
}

col_types = lapply(final_dataf, class)
col_types = lapply(col_types, function(x) x[length(x)])  # for posix,
take the more specified class
names(col_types)=NULL
col_types = unlist(col_types)

final_dataf = as.data.frame(lapply(final_dataf, function(x)
gsub('^\\s*$',NA,x)), stringsAsFactors = FALSE)
final_dataf = reset_col_types(final_dataf, col_types)

Thanks,
Allie


On 8/21/2009 10:54 AM, Steve Lianoglou wrote:
> Hi Allie,
>
> On Aug 21, 2009, at 11:47 AM, Alexander Shenkin wrote:
>
>> Hello all,
>>
>> I have a list which I'd like to convert to a data frame, while
>> maintaining control of the columns' data types (akin to the colClasses
>> argument in read.table).  My numeric columns, for example, are getting
>> converted to factors by as.data.frame.  Is there a way to do this, or
>> will I have to do as I am doing right now: allow as.data.frame to coerce
>> column-types as it sees fit, and then convert them back manually?
>
> This doesn't sound right ... are there characters buried in your
> numeric columns somewhere that might be causing this?
>
> I'm pretty sure this shouldn't happen, and a small test case here goes
> along with my intuition:
>
> R> a <- list(a=1:10, b=rnorm(10), c=LETTERS[1:10])
> R> df <- as.data.frame(a)
> R> sapply(df, is.factor)
>     a     b     c
> FALSE FALSE  TRUE
>
> Can you check to see if your data's wonky somehow?
>
> -steve
>
> -- 
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>   |  Memorial Sloan-Kettering Cancer Center
>   |  Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
>




More information about the R-help mailing list