[R] Unexpected behaviour as.data.frame

Santosh Srinivas santosh.srinivas at gmail.com
Mon May 16 10:42:39 CEST 2011


Hi Ivan, Take a look dataFrame in R.utils ... is that what you want?

from the help file:

Examples

  df <- dataFrame(colClasses=c(a="integer", b="double"), nrow=10)
  df[,1] <- sample(1:nrow(df))
  df[,2] <- rnorm(nrow(df))
  print(df)

Thanks,
Santosh

On Mon, May 16, 2011 at 1:42 PM, Ivan Calandra
<ivan.calandra at uni-hamburg.de> wrote:
> I feel like I'm always asking this type of questions, but is it possible to
> add a base function that allows creating an empty data.frame, as matrix()
> does?
>
> What I mean would be something like:
> create.data.frame(number_of_columns, mode_of_columns).
> I think it would make things easier than creating one or several matrices
> and then combining them
>
> Is it possible; does it make sense?
>
> Ivan
>
> Le 5/15/2011 22:17, Bert Gunter a écrit :
>>
>> Inline below.
>>
>> On Sun, May 15, 2011 at 11:11 AM, Jan van der Laan<rhelp at eoos.dds.nl>
>>  wrote:
>>>
>>> Thanks. I also noticed myself minutes after sending my message to the
>>> list.
>>> My 'please ignore my question it was just a stupid typo' message was sent
>>> with the wrong account and is now awaiting moderation.
>>>
>>> However, my other question still stands: what is the
>>> preferred/fastest/simplest way to create a data.fame with given column
>>> types
>>> and dimensions?
>>
>> I do not know, but  why is simply
>>
>> data.frame(numeric(10), character(10), integer(10),
>> stringsAsFactors=FALSE)
>>
>> not acceptable? Note that if you had, say, 500, numeric (= double) and
>> 100 character columns to add, you might do something like:
>>
>>> z<- matrix(numeric(5000),nr=10)
>>> u<- matrix(character(1000),nr=10)
>>> frm<- data.frame(z,u, stringsAsFactors = FALSE) ## 600 columns
>>
>> While this might save some typing, it may not be much more efficient
>> than typing it all out -- maybe just some parsing time is saved. You
>> can experiment and see.
>>
>> However, since a data.frame **is** a list with added attributes and a
>> great deal of the work of the constructor is in constructing and
>> checking these attributes (e.g. row and column names), I see nothing
>> terribly inefficient with what you did. It's just a bit obscure.  But
>> maybe someone with greater expertise will set us both straight.
>>
>> Cheers,
>> Bert
>>
>>
>>> Regards,
>>> Jan
>>>
>>>
>>> On 05/15/2011 04:43 PM, Bert Gunter wrote:
>>>>
>>>> In your post, you're missing the final "s" on the stringsAsFactors
>>>> argument in the d1 assignment. When I typed it correctly, it works as
>>>> expected.
>>>>
>>>> -- Bert
>>>>
>>>> On Sun, May 15, 2011 at 4:25 AM, Jan van der Laan<rhelp at eoos.dds.nl>
>>>>  wrote:
>>>>>
>>>>> I use the following code to create two data.frames d1 and d2 from a
>>>>> list:
>>>>> types<- c("integer", "character", "double")
>>>>> nlines<- 10
>>>>> d1<- as.data.frame(lapply(types, do.call, list(nlines)),
>>>>> stringsAsFactor=FALSE)
>>>>> l2<- lapply(types, do.call, list(nlines))
>>>>> d2<- as.data.frame(l2, stringsAsFactors=FALSE)
>>>>>
>>>>> I would expect d1 and d2 to be the same, however, in d1 the second
>>>>> column
>>>>> is
>>>>> a factor while in d2 it is a character (which I would expect):
>>>>>
>>>>>> str(d1)
>>>>>
>>>>> 'data.frame':   10 obs. of  3 variables:
>>>>>  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
>>>>>  $ c........................................: Factor w/ 1 level "": 1 1
>>>>> 1
>>>>> 1
>>>>> 1 1 1 1 1 1
>>>>>  $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0
>>>>>>
>>>>>> str(d2)
>>>>>
>>>>> 'data.frame':   10 obs. of  3 variables:
>>>>>  $ c.0L..0L..0L..0L..0L..0L..0L..0L..0L..0L.: int  0 0 0 0 0 0 0 0 0 0
>>>>>  $ c........................................: chr  "" "" "" "" ...
>>>>>  $ c.0..0..0..0..0..0..0..0..0..0.          : num  0 0 0 0 0 0 0 0 0 0
>>>>>
>>>>>
>>>>> As different but related question: I use the commands above to create
>>>>> an
>>>>> 'empty' data.frame with specified column types and dimensions. I need
>>>>> this
>>>>> data.frame to pass on to my c++ routines. Is there a more
>>>>> simple/elegant
>>>>> way
>>>>> of creating this data.frame?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Jan
>>>>>
>>>>>
>>>>> PS:
>>>>> I am running R on 64 bit Ubuntu 11.04:
>>>>>
>>>>>> sessionInfo()
>>>>>
>>>>> R version 2.12.1 (2010-12-16)
>>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>
>>>>> locale:
>>>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>>>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>>>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>
>>>>> attached base packages:
>>>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>
>>
>>
>
> --
> Ivan CALANDRA
> PhD Student
> University of Hamburg
> Biozentrum Grindel und Zoologisches Museum
> Abt. Säugetiere
> Martin-Luther-King-Platz 3
> D-20146 Hamburg, GERMANY
> +49(0)40 42838 6231
> ivan.calandra at uni-hamburg.de
>
> **********
> http://www.for771.uni-bonn.de
> http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list