[Rd] Help with "row.names = as.integer(c(NA, 5))" in file from dput

Peter Dalgaard p.dalgaard at biostat.ku.dk
Wed Feb 28 23:01:15 CET 2007


Mike Prager wrote:
> Peter--
>
> Thank you.  Am I correct in understanding, then, that,
>
> (1) The syntax I asked about is a special case, and the parser
> and/or dget() somehow recognize it as such, and
>
> (2) The syntax 1:15 (where 15 is the number of rows)  should
> work just as well as c(NA, 15)?
>
> I ask, again, because I want to ensure the widest possible
> compatibility for the way For2R is writing data in emulation of
> dput().
>
>   
Essentially yes, but

(1) it is not as much about syntax, but about internal representation

(2) Yes, it gives the same result -- the 1:15 is recognized as a vector 
that can be optimized to c(NA, 15). Needing to have the code check for 
this case is of course somewhat wasteful.

To wit:

 > dd <- structure(list(x = c(1.19894055844457, -0.476584995973736, 
1.90525643132169,   -0.726616166810353, 0.590506316214127)), .Names = 
"x", row.names =1:5, class = "data.frame") -
 > dput(dd,control="all")
structure(list(x = c(1.19894055844457, -0.476584995973736, 
1.90525643132169,
-0.726616166810353, 0.590506316214127)), .Names = "x", row.names = 
as.integer(c(NA,
5)), class = "data.frame")


> --Mike
>
>
> Peter Dalgaard <p.dalgaard at biostat.ku.dk> wrote:
>
>   
>> Mike Prager wrote:
>>     
>>> I am trying to understand why syntax used by dput() to write
>>> rownames is valid (say, when read by dget()).  I ask this
>>> because I desire to emulate its actions *reliably* in my For2R
>>> routines, and I won't be comfortable until I understand what R
>>> is doing.
>>>
>>> Given data set "fred":
>>>
>>>   
>>>       
>>>> fred
>>>>     
>>>>         
>>>     id      var1
>>> 1 1991 0.4388587
>>> 2 1992 0.8772471
>>> 3 1993 0.6230486
>>> 4 1994 0.2340929
>>> 5 1995 0.5005605
>>>
>>> we can try this--
>>>
>>>   
>>>       
>>>> dput(ats, control = "all")
>>>>     
>>>>         
>>> structure(list(id = c(1991, 1992, 1993, 1994, 1995), var1 =
>>> c(0.4388587, 0.8772471, 0.6230486, 0.2340929, 0.5005605)),
>>> .Names = c("id", "var1"), row.names = as.integer(c(NA, 5)),
>>> class = "data.frame")
>>>
>>> In the above result, why is the following part valid?
>>>
>>> row.names = as.integer(c(NA, 5))
>>>
>>> given that the length of the RHS expression is 2, while the
>>> needed length is 5.
>>>
>>> Moreover, the following doesn't work:
>>>
>>>   
>>>       
>>>> row.names(fred) <- as.integer(c(NA,5))
>>>>     
>>>>         
>>> Error in `row.names<-.data.frame`(`*tmp*`, value = c(NA, 5)) : 
>>>         invalid 'row.names' length
>>>
>>> Is there any reason why the expression
>>>
>>> c(NA,5) 
>>>
>>> is better here than the more natural
>>>
>>> 1:5 
>>>
>>> here?
>>>
>>>   
>>>       
>> It's mainly a space-saving device. Originally, row.names was a character 
>> vector, but storage of character vectors is quite inefficient, so we now 
>> allow integer names and also a very short form where 1:n is stored just 
>> using the single value n. To distinguish the latter two, we use the 
>> c(NA, n) form, because row names are not allowed to be missing.
>>
>> Consider the following and notice how the string row names take up 
>> roughly 36 bytes per  record where the actual data are only 8 bytes per 
>> record.
>>
>>  > d<-data.frame(x=rnorm(1000))
>>  > object.size(d)
>> [1] 8392
>>  > row.names(d)<-as.character(1:1000)
>>  > object.size(d)
>> [1] 44384
>>  > row.names(d)<-1000:1
>>  > object.size(d)
>> [1] 12384
>>  > row.names(d)<-NULL
>>  > object.size(d)
>> [1] 8392
>>
>>
>>
>>
>>     
>>> I will appreciate help from anyone with time to reply.
>>>
>>> MHP
>>>
>>>
>>>       
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>     
>
>



More information about the R-devel mailing list