[R] Unexpected behaviour as.data.frame

Bert Gunter gunter.berton at gene.com
Sun May 15 22:17:31 CEST 2011


Inline below.

On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan <rhelp at eoos.dds.nl> wrote:
> Thanks. I also noticed myself minutes after sending my message to the list.
> My 'please ignore my question it was just a stupid typo' message was sent
> with the wrong account and is now awaiting moderation.
>
> However, my other question still stands: what is the
> preferred/fastest/simplest way to create a data.fame with given column types
> and dimensions?

I do not know, but  why is simply

data.frame(numeric(10), character(10), integer(10), stringsAsFactors=FALSE)

not acceptable? Note that if you had, say, 500, numeric (= double) and
100 character columns to add, you might do something like:

> z <- matrix(numeric(5000),nr=10)
> u <- matrix(character(1000),nr=10)
> frm <- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns

While this might save some typing, it may not be much more efficient
than typing it all out -- maybe just some parsing time is saved. You
can experiment and see.

However, since a data.frame **is** a list with added attributes and a
great deal of the work of the constructor is in constructing and
checking these attributes (e.g. row and column names), I see nothing
terribly inefficient with what you did. It's just a bit obscure.  But
maybe someone with greater expertise will set us both straight.

Cheers,
Bert


>
> Regards,
> Jan
>
>
> On 05/15/2011 04:43 PM, Bert Gunter wrote:
>>
>> In your post, you're missing the final "s" on the stringsAsFactors
>> argument in the d1 assignment. When I typed it correctly, it works as
>> expected.
>>
>> -- Bert
>>
>> On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan<rhelp at eoos.dds.nl>
>>  wrote:
>>>
>>> I use the following code to create two data.frames d1 and d2 from a list:
>>> types<- c("integer", "character", "double")
>>> nlines<- 10
>>> d1<- as.data.frame(lapply(types, do.call, list(nlines)),
>>> stringsAsFactor=FALSE)
>>> l2<- lapply(types, do.call, list(nlines))
>>> d2<- as.data.frame(l2, stringsAsFactors=FALSE)
>>>
>>> I would expect d1 and d2 to be the same, however, in d1 the second column
>>> is
>>> a factor while in d2 it is a character (which I would expect):
>>>
>>>> str(d1)
>>>
>>> 'data.frame':   10 obs. of  3 variables:
>>>  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
>>>  $ c........................................: Factor w/ 1 level "": 1 1 1
>>> 1
>>> 1 1 1 1 1 1
>>>  $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0
>>>>
>>>> str(d2)
>>>
>>> 'data.frame':   10 obs. of  3 variables:
>>>  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
>>>  $ c........................................: chr  "" "" "" "" ...
>>>  $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0
>>>
>>>
>>> As different but related question: I use the commands above to create an
>>> 'empty' data.frame with specified column types and dimensions. I need
>>> this
>>> data.frame to pass on to my c++ routines. Is there a more simple/elegant
>>> way
>>> of creating this data.frame?
>>>
>>> Regards,
>>>
>>> Jan
>>>
>>>
>>> PS:
>>> I am running R on 64 bit Ubuntu 11.04:
>>>
>>>> sessionInfo()
>>>
>>> R version 2.12.1 (2010-12-16)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> locale:
>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>
>



-- 
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics



More information about the R-help mailing list