[R] Unexpected behaviour as.data.frame

Ivan Calandra ivan.calandra at uni-hamburg.de
Mon May 16 10:12:59 CEST 2011


I feel like I'm always asking this type of questions, but is it possible 
to add a base function that allows creating an empty data.frame, as 
matrix() does?

What I mean would be something like:
create.data.frame(number_of_columns, mode_of_columns).
I think it would make things easier than creating one or several 
matrices and then combining them

Is it possible; does it make sense?

Ivan

Le 5/15/2011 22:17, Bert Gunter a écrit :
> Inline below.
>
> On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan<rhelp at eoos.dds.nl>  wrote:
>> Thanks. I also noticed myself minutes after sending my message to the list.
>> My 'please ignore my question it was just a stupid typo' message was sent
>> with the wrong account and is now awaiting moderation.
>>
>> However, my other question still stands: what is the
>> preferred/fastest/simplest way to create a data.fame with given column types
>> and dimensions?
> I do not know, but  why is simply
>
> data.frame(numeric(10), character(10), integer(10), stringsAsFactors=FALSE)
>
> not acceptable? Note that if you had, say, 500, numeric (= double) and
> 100 character columns to add, you might do something like:
>
>> z<- matrix(numeric(5000),nr=10)
>> u<- matrix(character(1000),nr=10)
>> frm<- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns
> While this might save some typing, it may not be much more efficient
> than typing it all out -- maybe just some parsing time is saved. You
> can experiment and see.
>
> However, since a data.frame **is** a list with added attributes and a
> great deal of the work of the constructor is in constructing and
> checking these attributes (e.g. row and column names), I see nothing
> terribly inefficient with what you did. It's just a bit obscure.  But
> maybe someone with greater expertise will set us both straight.
>
> Cheers,
> Bert
>
>
>> Regards,
>> Jan
>>
>>
>> On 05/15/2011 04:43 PM, Bert Gunter wrote:
>>> In your post, you're missing the final "s" on the stringsAsFactors
>>> argument in the d1 assignment. When I typed it correctly, it works as
>>> expected.
>>>
>>> -- Bert
>>>
>>> On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan<rhelp at eoos.dds.nl>
>>>   wrote:
>>>> I use the following code to create two data.frames d1 and d2 from a list:
>>>> types<- c("integer", "character", "double")
>>>> nlines<- 10
>>>> d1<- as.data.frame(lapply(types, do.call, list(nlines)),
>>>> stringsAsFactor=FALSE)
>>>> l2<- lapply(types, do.call, list(nlines))
>>>> d2<- as.data.frame(l2, stringsAsFactors=FALSE)
>>>>
>>>> I would expect d1 and d2 to be the same, however, in d1 the second column
>>>> is
>>>> a factor while in d2 it is a character (which I would expect):
>>>>
>>>>> str(d1)
>>>> 'data.frame':   10 obs. of  3 variables:
>>>>   $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
>>>>   $ c........................................: Factor w/ 1 level "": 1 1 1
>>>> 1
>>>> 1 1 1 1 1 1
>>>>   $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0
>>>>> str(d2)
>>>> 'data.frame':   10 obs. of  3 variables:
>>>>   $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
>>>>   $ c........................................: chr  "" "" "" "" ...
>>>>   $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0
>>>>
>>>>
>>>> As different but related question: I use the commands above to create an
>>>> 'empty' data.frame with specified column types and dimensions. I need
>>>> this
>>>> data.frame to pass on to my c++ routines. Is there a more simple/elegant
>>>> way
>>>> of creating this data.frame?
>>>>
>>>> Regards,
>>>>
>>>> Jan
>>>>
>>>>
>>>> PS:
>>>> I am running R on 64 bit Ubuntu 11.04:
>>>>
>>>>> sessionInfo()
>>>> R version 2.12.1 (2010-12-16)
>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>
>>
>
>

-- 
Ivan CALANDRA
PhD Student
University of Hamburg
Biozentrum Grindel und Zoologisches Museum
Abt. Säugetiere
Martin-Luther-King-Platz 3
D-20146 Hamburg, GERMANY
+49(0)40 42838 6231
ivan.calandra at uni-hamburg.de

**********
http://www.for771.uni-bonn.de
http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php



More information about the R-help mailing list