[R] Corrupt data frame construction - bug?

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Thu Apr 30 10:08:56 CEST 2009


Duncan Murdoch wrote:
> On 29/04/2009 6:41 PM, Steven McKinney wrote:
>>
>>> foo <- matrix(1:12, nrow = 3)
>>> bar <- data.frame(foo)
>>> bar$NewCol <- foo[foo[, 1] == 4, 4]
>>> bar
>>   X1 X2 X3 X4 NewCol
>> 1  1  4  7 10   <NA>
>> 2  2  5  8 11   <NA>
>> 3  3  6  9 12   <NA>
>> Warning message:
>> In format.data.frame(x, digits = digits, na.encode = FALSE) :
>>   corrupt data frame: columns will be truncated or padded with NAs
>> Is this a bug in the data.frame machinery?
>> If an attempt is made to add a new column
>> to a data frame, and the new object does
>> not have length = number of rows of data frame,
>> or cannot be made to have such length via recycling,
>> shouldn't an error be thrown?
>>
>> Instead in this example I end up with a
>> "corrupt data frame" having one zero-length column.
>>
>>
>> Should this be reported as a bug, or did I misinterpret
>> the documentation?
>
> I don't think "$" uses any data.frame machinery.  You are working at a
> lower level.

well, there is the function `$<-.data.frame`.  why does

    bar$NewCol <- ...

*not* dispatch to $<-.data.frame?  $<- is used on bar, and bar is a data
frame:

    is(bar)
    # "data.frame" ...

    trace('$<-.data.frame')
    bar$foo <- 1
    # no output

    trace('$<-')
    bar$foo <- 1
    # trace: `$<-`(`*tmp*`, foo, value = 1)

(still with the ugly *tmp*-hack)

and, actually, ?'$<-.data.frame' says:

"     ## S3 replacement method for class 'data.frame':
     x$i <- value"


>
> If you had added the new column using
>
> bar <- data.frame(bar, NewCol=foo[foo[, 1] == 4, 4])
>
> you would have seen the error:
>
> Error in data.frame(bar, NewCol = foo[foo[, 1] == 4, 4]) :
>   arguments imply differing number of rows: 3, 0
>
> But since you treated it as a list, 

he has *not*:  he has used the "S3 replacement method for class
'data.frame'".  the fact that it didn't work as expected seems to be a
consequence of a bug in the dispatch mechanism.


> it let you go ahead and create something that was labelled as a
> data.frame but wasn't.  

wasn't?  what wasn't what?  after bar$NewCol <- integer(0), bar is
labelled as a data frame, and it seems to actually *be* a data frame; 
data frame operations seem to work on bar, and the warning from print
bar talks about a corrupt data frame, not a non-data frame. 

or do you mean that bar is not a data frame internally?  that would be a
semantic weirdo where a user successfully performs an operation on a
data frame and gets a zombie.  in any case, looks like a bug.

> This is one of the reasons some people prefer S4 methods:  it's easier
> to protect against people who mislabel things.

it's *R* that mislabels things here.  i can't see the user doing any
explicit labelling;  the only stuff used was data.frame() and '$<-.',
which should dispatch to '$<-.data.frame'.  the resulting zombie object
is clearly R's, not the user's, fault.

vQ




More information about the R-help mailing list