[R] Convert list to data frame while controlling column types

David Winsemius dwinsemius at comcast.net
Fri Aug 21 22:04:00 CEST 2009


On Aug 21, 2009, at 3:41 PM, Alexander Shenkin wrote:

> Thanks everyone for their replies, both on- and off-list.  I should
> clarify, since I left out some important information.  My original
> dataframe has some numeric columns, which get changed to character by
> gsub when I replace spaces with NAs.

If you used is.na() <-  that would not happen to a true _numeric_  
vector (but, of course, a numeric vector in a data.frame could not  
have spaces, so you are probably not using precise terminology). You  
would be well advised to include the actual code rather than applying  
loose terminology subject you your and our misinterpretation.

?is.na


I am guessing that you were using read.table() on the original data,  
in which case you should look at the colClasses parameter.

-- 
David Winsemius

> Thus, in going back to a
> dataframe, those (now character) columns get converted to factors.  I
> recently added stringsAsFactors to get characters to make things a bit
> easier.  I wrote the column-type reset function below, but it feels
> kludgey, so was wondering if there was some other way to specify how  
> one
> might want as.data.frame to handle the columns.
>
> str(final_dataf)
> 'data.frame':   1127 obs. of  43 variables:
> $ block      : Factor w/ 1 level "2": 1 1 1 1 1 1 1 1 1 1 ...
> $ treatment  : Factor w/ 4 levels "I","M","N","T": 1 1 1 1 1 1 1 1 1  
> 1 ...
> $ transect   : Factor w/ 1 level "4": 1 1 1 1 1 1 1 1 1 1 ...
> $ tag        : chr  NA "121AL" "122AL" "123AL" ...
> ...
> $ h1         : num  NA NA NA NA NA NA NA NA NA NA ...
> ...
>
> reset_col_types <- function (df, col_types) {
>    # Function to reset column types in dataframes.  col_types can be
> constructed
>    # by using lapply(class,df)
>
>    coerce_fun = list (
>        "character"   = `as.character`,
>        "factor"      = `as.factor`,
>        "numeric"     = `as.numeric`,
>        "integer"     = `as.integer`,
>        "POSIXct"     = `as.POSIXct`,
>        "logical"     = `as.logical` )
>
>    for (i in 1:length(df)) {
>        df[,i] = coerce_fun[[ col_types[i] ]]( df[,i] ) #apply coerce
> function
>    }
>    return(df)
> }
>
> col_types = lapply(final_dataf, class)
> col_types = lapply(col_types, function(x) x[length(x)])  # for posix,
> take the more specified class
> names(col_types)=NULL
> col_types = unlist(col_types)
>
> final_dataf = as.data.frame(lapply(final_dataf, function(x)
> gsub('^\\s*$',NA,x)), stringsAsFactors = FALSE)
> final_dataf = reset_col_types(final_dataf, col_types)
>
> Thanks,
> Allie
>
>
> On 8/21/2009 10:54 AM, Steve Lianoglou wrote:
>> Hi Allie,
>>
>> On Aug 21, 2009, at 11:47 AM, Alexander Shenkin wrote:
>>
>>> Hello all,
>>>
>>> I have a list which I'd like to convert to a data frame, while
>>> maintaining control of the columns' data types (akin to the  
>>> colClasses
>>> argument in read.table).  My numeric columns, for example, are  
>>> getting
>>> converted to factors by as.data.frame.  Is there a way to do this,  
>>> or
>>> will I have to do as I am doing right now: allow as.data.frame to  
>>> coerce
>>> column-types as it sees fit, and then convert them back manually?
>>
>> This doesn't sound right ... are there characters buried in your
>> numeric columns somewhere that might be causing this?
>>
>> I'm pretty sure this shouldn't happen, and a small test case here  
>> goes
>> along with my intuition:
>>
>> R> a <- list(a=1:10, b=rnorm(10), c=LETTERS[1:10])
>> R> df <- as.data.frame(a)
>> R> sapply(df, is.factor)
>>    a     b     c
>> FALSE FALSE  TRUE
>>
>> Can you check to see if your data's wonky somehow?
>>
>> -steve
>>
>> -- 
>> Steve Lianoglou
>> Graduate Student: Computational Systems Biology
>>  |  Memorial Sloan-Kettering Cancer Center
>>  |  Weill Medical College of Cornell University
>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




More information about the R-help mailing list